Powered by:
Open Science Grid
Center for High Throughput Computing

Docker Jobs

Linux containers are a way to build a self-contained environment that includes software, libraries, and other tools. This guide shows how to submit jobs that use Docker containers.

Overview

Typically, software in CHTC jobs is installed or compiled locally by individual users and then brought along to each job, either using the default file transfer or our SQUID web server. However, another option is to use a container system, where the software is installed in a container image. Using a container to handle software can be advantageous if the software installation 1) has many dependencies, 2) requires installation to a specific location, or 3) "hard-codes" paths into the installation.

CHTC (and the OSG) have capabilities to access and start containers and run jobs inside them. This guide shows how to do this for Docker containers.

In order to run your job inside a Docker container, you will need to:

  1. Find or prepare a Docker container image for your jobs to use
  2. Test the container locally
  3. Make a few changes to your submit file

1. Getting a Docker Container Image

To run a Docker job, you will first need access to a Docker container image that has been built and placed onto the DockerHub website. There are two primary ways to do this.

A. Pre-existing Images

The easiest way to get a Docker container image for running a job is to use a public or pre-existing image on DockerHub. You can find images by getting an account on DockerHub and searching for the software you want to use.

Sample images:

An image supported by a group will be continuously updated and the versions will be indicated by "tags". We recommend choosing a specific tag (or tags) of the container to use in CHTC.

B. Build Your Own Image

You can also build your own Docker container image and upload it to DockerHub. See the Docker documentation for more information.

2. Testing the Container

The next step is to test the container on your own computer before submitting a job to CHTC. Note that all the steps below should be run on your own computer, not in CHTC. If you created your own container image on your computer, you can skip steps A and B and start with C.

A. Install Docker to your computer

Download, install, and start the Docker Community Edition for your operating system. It sometimes takes some time for Docker to start, especially the first time.

B. "Pull" the container image that you're using

We need to have a local copy of the Docker container image in order to test it. To do this, choose which image you want to use and the tag for the version you want. The syntax for the full container image name will be username/imagename:tag. Then pull a copy of this Docker container image to your computer by running the following from either a Terminal (Mac/Linux) or Command Prompt (Windows):

$ docker pull username/image:tag

C. Choose your executable

There are two ways to run software inside a Docker container:

  1. Use a script that you transfer into the container, using software installed in the container.
  2. Use a script or executable program already inside the container.

Instructions for each of these use cases is below.

1. Using your own script (recommended)

Write a script that runs the steps of your job. Unlike in many of our guides, this script doesn't need to be written in a language like bash; instead, it can use something like Python or R directly from inside the container.

Note, that it is important that any script that is run this way will need a header at the top, indicating what kind of script it is. Some common headers include:

  • Bash:
    #!/bin/bash
  • Python:
    #!/usr/bin/env python
  • R:
    #!/usr/bin/env Rscript
Do I need an executable script? If your job only needs to run one command you don't need a script to serve as the jobs executable. See below.

2. Using an executable already inside the container

If the executable is already in the container, you simply need to know what command you need to run to use it.

D. Create a folder with job files

For testing, we need a folder on your computer to stand in for the directory that HTCondor creates for running your job. Create a folder for this purpose on your Desktop. The folder's name shouldn't include any spaces. Inside this folder, put all of the files that are normally inside the working directory for a single job -- data, scripts, etc. If you're using your own executable script, this should be in the folder.

Open a Windows Command Prompt or Mac/Linux Terminal to access that folder, using:

  • Mac/Linux:
    $ cd ~/Desktop/folder
  • Windows:
    $ cd %HOMEPATH%\Desktop\folder

Replace "folder" with the name of the folder you created.

D. Start the Docker container

We will start the desired Docker container in order to see if it works. First make sure Docker is running. Then run the command below to start the container. The command can be run verbatim except for the username, imagename and tag; these should be whatever you used to pull or tag the container image.

  • Mac/Linux:
    $ docker run --user $(id -u):$(id -g) --rm=true -it \
      -v $(pwd):/scratch -w /scratch \
      username/imagename:tag /bin/bash
  • Windows:
    $ docker run --rm=true -it -v %CD%:/scratch -w /scratch username/imagename:tag /bin/bash

For Windows users, a window may pop up, asking for permission to share your main drive with Docker. This is necessary for the files to be placed inside the container.

E. Test the job

Your command line prompt should have changed to a number (this represents the running container instance). We can now see if the job would complete successfully! If you have an executable script, you can run it like so:

bob@12335:/scratch$ ./exec.sh
If your "executable" is software already in the container, run the appropriate command to use it.

The following commands may not be necessary, but if you see messages about "Permission denied" or a bash error about bad formatting, you may want to try one (or both) of the following:

You may need to add executable permissions to the script for it to run correctly:

bob@12335:/scratch$ chmod +x exec.sh

Windows users who are using a bash script may also need to run the following two commands:

bob@12335:/scratch$ cat exec.sh | tr -d \\r > temp.sh
bob@12335:/scratch$ mv temp.sh exec.sh 
Replace exec.sh with the name of your own executable.

When your test is done, type "exit" to leave the container:

bob@12335:/scratch$ exit

If the program didn't work, try searching for the cause of the error messages, or email CHTC's Research Computing Facilitators.

If your local test did run successfully, you are now ready to set up your Docker job to run on CHTC.

3. Submit File Customization

Jobs that run inside a Docker container will be almost exactly the same as "vanilla" HTCondor jobs. There are three needed customizations to the submit file: one to indicate which Docker container to use, one to request the right operating system, and the usual list of your particular executable and input files.

A. Using a Docker Image

Start with a usual CHTC submit file like the one shown in our Hello World guide. Then, make the following two changes:
  1. Change the universe from "vanilla" to "docker":
    universe = docker
  2. Add a line to indicate which Docker image you want to use for running your job:
    docker_image = user_name/image_name:tag

When your job starts, HTCondor will pull the indicated image from DockerHub, and use it to run your job.

B. Executable and Input Files

Your wrapper script from the test on your computer should be listed as the job's executable. The other needed files from your test directory should be listed in transfer_input_files.