Running R Jobs on CHTC

ACTION REQUIRED: As of September 29th, the HTC system’s default operating system will transition to CentOS Stream 8. This guide has been updated to reflect this change. However, this transition may impact users were running R jobs before September 29th. For more information, see the HTC Operating System Transition guide.

To best understand the below information, users should already have an understanding of:

Overview

CHTC provides several copies of R that can be used to run R code in jobs. See our list of supported versions here: CHTC Supported R

This guide details the steps needed to:

  1. Create a portable copy of your R packages
  2. Write a script that uses R and your packages
  3. Submit jobs

If you want to build your own copy of base R, see this archived page:

CHTC-Provided R Installations

CHTC provides a pre-built copy of the following versions of R:

Building on CentOS Stream 8 Linux

R version Name of R installation file
R 3.5.1 R351.tar.gz
R 3.6.3 R363.tar.gz
R 4.0.5 R405.tar.gz
R 4.1.3 R413.tar.gz

If you need a newer version of R than is shown here, please contact us! We want to continuously add new versions of R to this list and rely on your needs to know what we should add.

If you need a specific version of R not shown in this list, especially if it is am older R version, we recommend using a Docker container with R installed to run your jobs (see CHTC’s Docker Jobs guide). The Rocker organization on Docker Hub has an excellent selection of containers with many different versions of R. Contact us with any questions about this.

1. Adding R Packages

If your code uses specific R packages (like dplyr, rjags, etc) follow the directions below to download and prepare the packages you need for job submission. If your job does not require any extra R packages, skip to parts 2 and 3.

You are going to start an interactive job that runs on the HTC build servers and that downloads a copy of R. You will then install your packages to a folder and zip those files to return to the submit server.

These instructions are primarily about adding packages to a fresh install of R; if you want to add packages to a pre-existing package folder, there will be notes below in boxes like this one.

A. Submit an Interactive Job

Create the following special submit file on the submit server, calling it something like build.sub.

# R build file

universe = vanilla
log = interactive.log

# Choose a version of R from the table above
transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/chtc/el8/R###.tar.gz

+IsBuildJob = true
requirements = (OpSysMajorVer =?= 8)
request_cpus = 1
request_memory = 4GB
request_disk = 2GB

queue

The only thing you should need to change in the above file is the name of the R###.tar.gz file - in the "transfer_input_files" line. We have four versions of R available to build from -- see the table above.

If you want to add packages to a pre-existing package directory, add the tar.gz file with the packages to the transfer_input_files line:

transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/chtc/el8/R###.tar.gz, packages.tar.gz

Once this submit file is created, you will start the interactive job by running the following command:

[alice@submit]$ condor_submit -i build.sub

It may take a few minutes for the build job to start.

B. Install the Packages

1. Set up R

Once the interactive build job starts, you should see the R installation that you specified inside the working directory:

[alice@build]$ ls -l
-rw-r--r-- 1 alice alice  78M Mar 26 12:24 R###.tar.gz
drwx------ 2 alice alice 4.0K Mar 26 12:24 tmp
drwx------ 3 alice alice 4.0K Mar 26 12:24 var

We'll now unzip the copy of R and set the PATH variable to reference that version of R:

[alice@build]$ tar -xzf R###.tar.gz
[alice@build]$ export PATH=$PWD/R/bin:$PATH
[alice@build]$ export RHOME=$PWD/R

To make sure that your setup worked, try running:

[alice@build]$ R --version

The output should match the version number that you want to be using!

If you brought along your own package directory that you previously created by following this tutorial, un-tar it here and skip the directory creation step below (i.e. you do not need to run mkdir packages because this directory already exists and should have been brought along in your submit file).

2. Install packages

First, create, a directory to put your packages into:

[alice@build]$ mkdir packages

Then, tell R to use that directory for the packages you're going to install:

[alice@build]$ export R_LIBS=$PWD/packages

You can choose what name to use for this directory -- if you have different sets of packages that you use for different jobs, you could use a more descriptive name than "packages".

Then start the R console:

[alice@build]$ R

In the R terminal, install your packages using install.packages.

> install.packages("package_name")

Replace “package_name” with the name of the package you wish to install.

The first time you will be prompted to choose a "CRAN mirror" - this is where R is downloading the package. Choose any US-based location to download.

If you need a Bioconductor package you will first need to install the Bioconductor installation manager, then use Bioconductor to install your package:

> if (!requireNamespace("BiocManager", quietly = TRUE))
      install.packages("BiocManager")
> BiocManager::install("package_name") 

After you've installed all your packages, we recommend loading each library to confirm that they installed successfully:

> library(package_name)

Repeat this step as needed to load all packages installed during your interactive session.

Then exit the R console:

> quit()

C. Finish Up

1. Create a tar.gz file of your packages

Right now, if we exit the interactive job, nothing will be transferred back because we haven't created any new files in the working directory, just sub-directories. In order to transfer back our installation, we will need to compress it into a tarball file - not only will HTCondor then transfer back the file, it is generally easier to transfer a single, compressed tarball file than an uncompressed set of directories.

Run the following command to create your own tarball of your packages:

[alice@build]$ tar -czf packages.tar.gz packages/

Again, you can use a different name for the tar.gz file, if you want.

2. Finish the interactive build job

We now have our packages bundled and ready for CHTC! You can now exit the interactive job and the tar.gz file with your R packages will return to the submit server with you.

[alice@build]$ exit 

2. Creating a Script

In order to use CHTC's copy of R and the packages you have prepared in an HTCondor job, we will need to write a script that unpacks both R and the packages and then runs our R code. We will use this script as as the executable of our HTCondor submit file.

A sample script appears below. After the first line, the lines starting with hash marks are comments . You should replace "my_script.R" with the name of the script you would like to run.

#!/bin/bash

# untar your R installation. Make sure you are using the right version!
tar -xzf R###.tar.gz
# (optional) if you have a set of packages (created in Part 1), untar them also
tar -xzf packages.tar.gz

# make sure the script will use your R installation, 
# and the working directory as its home location
export PATH=$PWD/R/bin:$PATH
export RHOME=$PWD/R
export R_LIBS=$PWD/packages

# run your script
Rscript my_script.R

If you have additional commands you would like to be run within the job, you can add them to this base script. Once your script does what you would like, give it executable permissions by running:

[alice@submit] chmod +x run_R.sh

Arguments in R

To pass arguments to an R script within a job, you'll need to use the following syntax in your main executable script, in place of the generic command above:

Rscript myscript.R $1 $2

Here, $1 and $2 are the first and second arguments passed to the bash script from the submit file (see below), which are then sent on to the R script. For more (or fewer) arguments, simply add more (or fewer) arguments and numbers.

In addition, your R script will need to be able to accept arguments from the command line. There is sample code for doing this on this r-bloggers.com page and about a quarter of the way into this Software Carpentry lesson (look for print-args-trailing.R).

3. Submitting Jobs

A sample submit file can be found in our hello world example page. You should make the following changes in order to run R jobs:

  • Your executable should be the script that you wrote above.

    executable = run_R.sh
    
  • Modify the CPU/memory request lines. Test a few jobs for disk space/memory usage in order to make sure your requests for a large batch are accurate! Disk space and memory usage can be found in the log file after the job completes.
  • Change transfer_input_files to include:

    transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/chtc/el8/R###.tar.gz, packages.tar.gz, my_script.R
    
  • If your script takes arguments (see the box from the previous section), include those in the arguments line:

    arguments = value1 value2
    

How big is your package tarball?

If your package tarball is larger than 100 MB, you should NOT transfer the tarball using transfer_input_files. Instead, you should use CHTC's web proxy, squid. To learn more about squid please see our user guide File Availability with Squid Web Proxy. To request space on squid, email the research computing facilitators at chtc@cs.wisc.edu.