Build a Docker Container Image
Linux containers are a way to build a self-contained environment that includes software, libraries, and other tools. CHTC currently supports running jobs inside Docker containers. This guide describes how to build a Docker image that you can use for running jobs in CHTC. For information on using this image for jobs, see our Docker Jobs guide.
Note that all the steps below should be run on your own computer, not in CHTC.
Docker images can be created using a special file format called a “Dockerfile”. This file has commands that allow you to:
- use a pre-existing Docker image as a base
- add files to the image
- run installation commands
- set environment variables
You can then “build” an image from this file, test it locally, and push it to DockerHub, where HTCondor can then use the image to build containers to run jobs in. Different versions of the image can be labeled with different version “tags”.
This guide has:
A. Step by Step Instructions
1. Set Up Docker on Your Computer
If you haven’t already, create a DockerHub account and install Docker on your computer. You’ll want to look for the Docker Community Edition for your operating system. It sometimes takes some time for Docker to start, especially the first time. Once Docker starts, it won’t open a window; you’ll just see a little whale and container icon in one of your computers toolbars. In order to actually use Docker, you’ll need to open a command line program (like Terminal, or Command Prompt) and run commands there.
2. Explore Docker Containers (optional)
If you have never used Docker before, we recommend exploring a pre-existing container and testing out installation steps interactively before creating a Dockerfile. See the first half of this guide: Exploring and Testing a Docker Container
3. Create a Dockerfile
A Dockerfile is a plain text file with keywords that add elements to a Docker image. There are many keywords that can be used in a Dockerfile (documented on the Docker website here: Dockerfile keywords), but we will use a subset of these keywords following this basic outline:
- Starting point: Which Docker image do you want to start with?
- Additions: What needs to be added? Folders? Data? Other software?
- Environment: What variables (if any) are set as part of the software installation?
Create the file
Create a blank text file named
Dockerfile. If you are planning on making
multiple images for different parts of your workflow,
you should create a separate folder for each
new image with the a
Dockerfile inside each of them.
Choose a base image with
Usually you don’t want to start building your image from scratch. Instead you’ll want to choose a “base” image to add things to.
You can find a base image by searching DockerHub. If you’re using a scripting language like Python, R or perl, you could start with the “official” image from these languages. If you’re not sure what to start with, using a basic Linux image (Debian, Ubuntu and CentOS are common examples) is often a good place to start.
Images often have tagged versions. Besides choosing the image you want, make sure to choose a version by clicking on the “Tags” tab of the image.
Once you’ve decided on a base image and version, add it as the first line of your Dockerfile, like this:
Some images are maintained by DockerHub itself (these are the “official” images mentioned above), and do not have a repository. For example, to start with Centos 7, you could use
while starting from one of HTCondor’s HTC Jupyter notebook images might look like
When possible, you should use a specific tag
(not the automatic
Here are some base images you might find useful to build off of:
Install packaged software with
The next step is the most challenging. We need to add commands to the Dockerfile to install the desired software. There are a few standard ways to do this:
- Use a Linux package manager. This is usually
apt-getfor Debian-based containers (e.g, Ubuntu) or
yumfor RedHat Linux containers (e.g., CentOS).
- Use a software-specific package manager (like
- Use installation instructions (usually a progression of
Each of these options will be prefixed by the
RUN keyword. You can
join together linked commands with the
&& symbol; to break lines, put
\ at the end of the line.
RUN can execute any command inside the
image during construction, but keep in mind that the only thing kept in the final
image is changes to the filesystem (new and modified files, directories, etc.).
For example, suppose that your job’s executable ends up running Python and
needs access to the packages
scipy, as well as the Unix tool
Below is an example of a
Dockerfile that uses
RUN to install these packages
using the system package manager and Python’s built-in package manager.
# Build the image based on the official Python version 3.8 image
# Our base image happens to be Debian-based, so it uses apt-get as its system package manager
# Use apt-get to install wget
RUN apt-get update \
&& apt-get install wget
# Use RUN to install Python packages (numpy and scipy) via pip, Python's package manager
RUN pip3 install numpy scipy
If you need to copy specific files (like source code) from your computer into the
image, place the files in the same folder as the
Dockerfile and use the
COPY keyword. You could also download files
within the image by using the
RUN keyword and commands like
For example, suppose that you need to use
rjags package for R.
If you have the
JAGS source code
downloaded next to the
Dockerfile, you could compile and
install it inside the image like so:
# COPY the JAGS source code into the image under /tmp
COPY JAGS-4.3.0.tar.gz /tmp
# RUN a series of commands to unpack the JAGS source, compile it, and install it
RUN cd /tmp \
&& tar -xzf JAGS-4.3.0.tar.gz \
&& cd JAGS-4.3.0 \
&& ./configure \
&& make \
&& make install
# install the R package rjags
RUN install2.r --error rjags
Set up the environment with
Your software might rely on certain environment variables being set correctly.
One common situation is that if you’re installing a program to a custom location
(like a home directory), you may need to add that directory to the image’s system
PATH. For example, if you installed some scripts to
to add them to your
You can set multiple environment variables at once:
ENV DEBIAN_FRONTEND=noninteractive \
4. Build, Name, and Tag the Image
So far we haven’t actually created the image – we’ve just been listing instructions for how to build the image in the Dockerfile. Now we are ready to build the image!
First, decide on a name for the image, as well as a tag. Tags are important for tracking which version of the image you’ve created (and are using). A simple tag scheme would be to use numbers (e.g. v0, v1, etc.), but you can use any system that makes sense to you.
Because HTCondor caches Docker images by tag, we strongly recommend that you
never use the
latest tag, and always build images with a new, unique tag that
you then explicitly specify in new jobs.
To build and tag your image, open a Terminal (Mac/Linux) or Command Prompt (Windows) and navigate to the folder that contains your Dockerfile:
$ cd directory
directory with the path to the appropriate folder.)
Then make sure Docker is running (there should be an icon on
your status bar, and running
docker info shouldn’t indicate any errors) and run:
$ docker build -t username/imagename:tag .
username with your Docker Hub username and replace
tag with the values of your choice. Note the
. at the end
of the command (to indicate “the current directory”).
If you get errors, try to determine what you may need to add or change to your Dockerfile and then run the build command again. Debugging a Docker build is largely the same as debugging any software installation process.
5. Test Locally
This page describes how to interact with your new Docker image on your own computer, before trying to run a job with it in CHTC:
6. Push to DockerHub
Once your image has been successfully built and tested, you can push it to DockerHub so that it will be available to run jobs in CHTC. To do this, run the following command:
$ docker push username/imagename:tag
(Where you once again replace
username/imagename:tag with what you used in
The first time you push an image to DockerHub, you may need to run this command beforehand:
$ docker login
It should ask for your DockerHub username and password.
If you have a free account on Docker Hub, any container image that you have pushed there will be scheduled for removal if it is not used (pulled) at least once every 6 months (See the Docker Terms of Service).
For this reason, and just because it’s a good idea in general, we recommend creating a file archive of your container image and placing it in whatever space you use for long-term, backed-up storage of research data and code.
To create a file archive of a container image, use this command, changing the name of the archive file and container to reflect the names you want to use:
docker save --output archive-name.tar username/imagename:tag
It’s also a good idea to archive a copy of the Dockerfile used to generate a container image along with the file archive of the container image itself.
7. Running Jobs
Once your Docker image is on Docker Hub, you can use it to run jobs on CHTC’s HTC system. See this guide for more details:
This section holds various example
Dockerfile that cover more advanced use cases.
Installing a Custom Python Package from GitHub
Suppose you have a custom Python package hosted on GitHub, but not available
pip can install packages directly from
git repositories, you could
install your package like this:
RUN pip3 install git+https://github.com/<RepositoryOwner>/<RepositoryName>
where you would replace
<RepositoryName> with your
COPY Miniconda3-latest-Linux-x86_64.sh /tmp
RUN mkdir /home/qiimeuser
RUN cd /tmp \
&& ./Miniconda3-latest-Linux-x86_64.sh -b -p /home/qiimeuser/minconda3 \
&& export PATH=/home/qiimeuser/minconda3/bin:$PATH \
&& conda update conda \
&& conda create -n qiime2-2017.10 --file https://data.qiime2.org/distro/core/qiime2-2017.10-conda-linux-64.txt