Running Python Jobs on CHTC
To best understand the below information, users
should already have an understanding of:
Many CHTC users have Python programs requiring Python versions that
are not installed
on CHTC's high throughput system, which includes the CHTC Pool, the UW Grid
(flocking) and the Open Science Grid (GlideIn). Instead, you get to choose
the version of Python you want, and bring it along with your jobs.
This guide details the steps needed to:
- Build a Python installation for use in your jobs
- Write a script that unpacks your Python installation and
runs your Python code
- Submit jobs
1. Building a Python Installation
To run Python jobs, you will first need to build a Python installation for your
jobs to use.
A. Get the Python Version You Need
Before starting, locate the version of Python that you want to use
from python.org. Transfer
or download the
.tgz file to the submit server.
Instead of installing Python from source, it is also possible to create a Python
installation using a Python distribution. Examples include
Anaconda and miniconda (from
Continuum Analytics) and
(from Enthought). The only change to the instructions
below will be the source file (the distribution's install script, instead of
source code) and the exact commands required to create a local installation
(Step 3 below). Otherwise,
the process is nearly identical - install the Python distribution locally and
create a tarball of the installed directory.
One major drawback of using a distribution is the size of the installation - the
full Anaconda distribution is over 300 MB, whereas a Python installation from source
with a few packages is less than 40 MB.
B. Create a Python Installation in an Interactive Job
Because a python installation can be computationally intensive, it should not be
performed on the submit server. Instead, you can create your installation
on a build server (dedicated), by using an interactive job.
The interactive job is essentially a job without an executable;
you are the one running the commands instead (in this case, to install Python).
Like a regular HTCondor job, once you finish our installation on the build server,
the output files (for us, our Python installation) will be transferred back to
the submit server so that you can use it to submit your jobs.
Submit an Interactive Build Job
Instructions for submitting an interactive build job are
Note that you should replace
source_code.tar.gz with the name of
the Python source tarball that you downloaded. If you downloaded additional source
code for modules in part A, you should list those in the
transfer_input_files line as well.
Submit the interactive job and wait for it to start.
Prepare the Installation Directory
Once the interactive job starts, create a directory
for the installation, which can be done with the
[alice@build]$ mkdir python
Next, untar the source code that you transferred over.
In the command below, replace
with the name of your Python tarball.
[alice@build]$ tar -xzf python_source.tgz
To install Python, we will run a configuration script that includes
an option to set the installation location. We will set the location
to the directory we created above, and then complete the installation by
Move into the untarred Python source directory
(it should be named something like "Python-#.#").
[alice@build]$ cd Python-#.#
From that directory, type the following commands to compile and
install Python to the directory you created in step 2:
[alice@build]$ ./configure --prefix=$(pwd)/../python
[alice@build]$ make install
Check the Installation
Once these commands have finished executing, move back into the main working directory.
[alice@build]$ cd ..
Then, check the contents of your
python directory. It should
look like this:
[alice@build]$ ls python
bin include lib share
Finally, make sure you have a python exectuable. Run:
[alice@build]$ ls python/bin
You should see something like this:
2to3 idle3 pydoc3 python3.4-config pyenv
2to3-3.4 idle3.4 pydoc3.4 python3.4m pyenv-3.4
easy_install-3.4 pip3.4 python3 python3.4m-config
f2py3.4 pip3 python3.4 python3-config
The number of items may vary, depending on which version of Python you
used. If you do not see the plain
(as above), do the following:
$ cp python/bin/python3 python/bin/python
Replace "python3" with "python2", if that's the version you've installed.
If you are installing any additional modules, do so now:
PATH variable to include your Python
[alice@build]$ export PATH=$(pwd)/python/bin:$PATH
Install pip, a python package manager.
Go to the pip
documentation page and follow the directions under
"Installing with get-pip.py".
You can download the
script by copying the link to the script and then typing:
[alice@build]$ wget http://link.to.get-pip.py
Then, for each module needed by your code, run:
[alice@build]$ pip install module_name
pip should download all dependent packages and install them. Certain
modules may take longer than others.
Exit the Interactive Job
Right now, if we exit the interactive job, nothing will be transferred back
because we haven't created any new files in the working directory, just
sub-directories. In order to transfer back our installation, we will
need to compress it into a tarball file - not only will HTCondor then transfer
back the file, it is generally easier to transfer
a single, compressed tarball file than an uncompressed set of directories.
Run the following command to create your own tarball of the installation:
[alice@build]$ tar -czvf python.tar.gz python/
The installation is complete! You can now exit the interactive job and the
tarball of your Python installation will return to the submit server with you.
2. Creating a Script
We now have a
python.tar.gz file that contains our entire Python
installation. In order to use this installation in our HTCondor jobs, we will need
to write a script that unpacks our Python installation and then runs our Python
code. We will use this script as as the
executable of our HTCondor
A sample script appears below. After the first line, the lines starting
with hash marks are comments . You should replace "myscript.py" with the name of
the script you would like to run.
# untar your Python installation
tar -xzf python.tar.gz
# make sure the script will use your Python installation,
# and the working directory as it's home location
# run your script
If you have additional commands you would like to be run within the job, you
can add them to this base script. Once your script does what you would like, give
it executable permissions by running:
[alice@submit] chmod +x run_python.sh
3. Submitting Jobs
A sample submit file can be found in our hello world
example page. You should make the following changes in order to run
executable should be the script that you wrote
transfer_input_files to include your
Python installation tarball (
Python scripts, and any input files your job needs.
How big is your installation tarball?
If your installation tarball is larger than 40-50 MB, you should NOT transfer
the tarball using
transfer_input_files. Instead, you should use
CHTC's web proxy,
squid. In order to request space
squid, email the research computing facilitators at
- Add the following requirements line in order to request the correct
requirements = (OpSys == "LINUX") && (OpSysMajorVer == 6)
- Modify the CPU/memory request lines. Test a few jobs for disk space/memory usage in
order to make sure your requests for a large batch are accurate!
Disk space and memory usage can be found in the log file after the job completes.