Running R Jobs on CHTC
To best understand the below information, users
should already have an understanding of:
Many CHTC users have R programs requiring R versions and
specialized packages that are not installed
on CHTC's high throughput system, which includes the CHTC Pool, the UW Grid
(flocking) and the Open Science Grid (GlideIn). In order to run R
jobs, you can build a version of R with the packages you want and use it
within your jobs.
This guide details the steps needed to:
- Build an R installation for use in your jobs
- Write a script that unpacks your R installation and
runs your R code
- Submit jobs
1. Building an R Installation
To run R jobs, you will first need to build a portable R installation
that will later go along with each of your R jobs.
A. Get the Source Code for the R Version You Want
Before running any commands on CHTC, use a browser to get the source code
for your desired version of R from CRAN.
(See the note below on supported versions.)
Under "Source Code for all Platforms", find the R-#.#.#.tar.gz file for your
desired version of R and download it to your computer before copying to the submit
To use R version 3.3.0 or higher, you will need to compile and run your jobs
on CHTC's new CentOS 7 servers. There is information in both our compiling
guide and on this page about accessing
CentOS 7 to compile and run jobs.
If you can use an earlier version of R (3.2.# or earlier), you will be able to
compile on our Scientific Linux 6 build server and run on both Scientific Linux
6 and CentOS 7, accessing the most capacity.
B. Create a Portable R Installation in an Interactive Job
Make sure you know which major packages are used in your code (anything loaded using the
library() command), before getting started.
Because creating an R installation can be computationally intensive, it should not be
performed on the submit server. Instead, you will create your installation
on a CHTC build server by using an interactive job.
The interactive job is essentially a job without an executable;
you will be the one running the commands, instead (in this case, to install R).
Like a regular HTCondor job, once you finish your R installation on the build server,
the output files (your completed portable R installation) will be transferred back to
the submit server (so that you can use the R installation for later jobs).
Submit an Interactive Build Job
Instructions for submitting an interactive build job are
You'll need to do Step 2.
Note that you should replace the contents of the "transfer_input_files" line
with the name of the R source tarball that you downloaded.
Submit the interactive job and wait for it to start (this is Step 3
of the guide above).
Once the interactive job starts, we can install R.
To install R, we will run a configuration script that includes
an option to set the installation location. We will set the location
to our current directory, and then complete the installation by
make. (In what follows,
should always be replaced by the name/version of the R code
that you chose in Part A.)
Un-tar and move into the untarred R source directory:
[alice@build]$ tar -xzf R-3.x.x.tar.gz
[alice@build]$ cd R-3.x.x
From that directory, type the following commands. The middle one may take a while!:
[alice@build]$ ./configure --prefix=$(pwd)
[alice@build]$ make install
After the last command finishes, move back to the
main working directory:
[alice@build]$ cd ..
The installation steps above should have generated an R installation
lib64 subdirectory of the installation directory. We
can start R by typing the path to that installation, like so:
This should open up an R console, which is how we're going to
install any extra R libraries. Install each of the library packages your
code needs by using R's
You only need to install the major packages needed by your
code; if you install a package that depends on other packages,
those will automatically be installed.
The first time you will be prompted to choose a "CRAN mirror" - this is
where R is downloading the package. Choose any
Once you've installed all the packages, type
exit the R session. You don't need to save the workspace.
Edit the R executable
Once you've added the packages you need, you need to edit the R executable
that you used in the previous section. You can do this with a command line
text editor - this example uses the
nano text editor:
[alice@build]$ nano R-3.x.x/lib64/R/bin/R
The above will open up the main R executable. You will need to change the first line,
from something like:
Save and close the file. (In
nano, this will be CTRL-O, followed
Exit the Interactive Job
Right now, if we exit the interactive job, nothing will be transferred back
because we haven't created any new files in the working directory, just
sub-directories. In order to transfer back our installation, we will
need to compress it into a tarball - not only will HTCondor then transfer
back the resulting file, it is generally easier to transfer
a single, compressed tarball file than an uncompressed set of directories.
Move the directory with your R installation to the main working directory:
[alice@build]$ mv R-3.x.x/lib64/R ./
Run the following command to create your own tarball of the installation:
[alice@build]$ tar -czvf R.tar.gz R/
The installation is complete! You can now exit the interactive job and your
R installation tarball will return to the submit server.
2. Creating a Script
We now have an
R.tar.gz file that contains our entire R
installation. In order to use this installation in our HTCondor jobs, we will need
to write a script that unpacks our R installation and then runs our R
code. We will use this script as the
executable of our HTCondor
A sample script appears below. After the first line, the lines starting
with hash marks are comments.
# untar your R installation
tar -xzf R.tar.gz
# make sure the script will use your R installation
# run R, with the name of your R script
R CMD BATCH myscript.R
If you have additional commands you would like to be run within the job, you
can add them to this base script.
Arguments in R
To pass arguments to an R script within a job, you'll need to use the following
syntax in your main executable script, in place of the generic command above:
R CMD BATCH '--args argname='$1' argname='$2'' myscript.R
$2 are the first and second
arguments passed to the bash script from the submit file (see below),
which are then sent on to the R script. For more (or fewer) arguments,
simply add more (or fewer) argument names and numbers.
In addition, your R script will need to be able to accept arguments from the
command line. There is sample code for doing this on
3. Submitting Jobs
The submit file you use for submitting your R jobs will be different from
the one you created in part 1 for building your
R installation. You'll want to create a new submit file; a good starting
point is the sample submit filee on our hello world
example page. You should make the following changes in order to run
executable should be the script that you wrote
transfer_input_files to include your
R installation tarball (
R scripts, and any input files your job needs.
How big is your installation tarball?
If your installation tarball is larger than 100 MB, you should NOT transfer
the tarball using
transfer_input_files. Instead, you should use
CHTC's web proxy,
squid. In order to request space
squid, email the research computing facilitators at
- If your script takes arguments (see the box from the
previous section), include those in the arguments line:
arguments = value1 value2
- Include the below requirements line in order to request the operating system
of the server your interactive job ran on.
requirements = (OpSys == "LINUX") && (OpSysMajorVer == 6)
- Test a few jobs for disk space/memory usage in
order to make sure your requests for a large batch are accurate!
Disk space and memory usage can be found in the log file after jobs complete.