Powered by:
Open Science Grid
Center for High Throughput Computing

Running Matlab Jobs on CHTC

The examples and information in this guide work best for the below cases*:

  • Submission to an HTCondor System with file transfer (rather than a shared filesystem).
  • Submission to an HTCondor System that is unix-based (Linux or Mac operating system, as Windows may have important differences).
(*see the HTCondor software manual and online examples from other organizations for cases outside of those we cover on the CHTC website)

To best understand the below information, users should already have an understanding of:

Overview

Like most programs, Matlab is not installed on CHTC's high throughput compute system. One way to run Matlab where it isn't installed is to compile Matlab .m files into a binary file and run the binary by using a set of files called the Matlab runtime. In order to run Matlab in CHTC, it is therefore necessary to perform the following steps which will be detailed in the guide below (click on the links to go to the relevant section):

  1. Prepare your Matlab program
  2. Write a submit file that uses the compiled code and script

If your Matlab code depends on random number generation, using a function like rand or randperm, please see the section on ensuring randomness below.

1. Preparing Your Matlab Program

You can compile .m files into a Matlab binary yourself by requesting an interactive session on one of our build machines. The session is essentially a job without an executable; you are the one running the commands instead (in this case, to compile the code).

Instructions for submitting an interactive build job are here: http://chtc.cs.wisc.edu/inter-submit.shtml. You'll need to do Step 2 (Creating Interactive Submit Files), and the first command of Step 3 (Submitting and Working Interactively).
For Step 2, you'll need to change transfer_input_files to reflect all the .m files on which your program depends. These files need to be uploaded to the submit server before you submit the interactive job for compiling. If you have many files or directories that are part of your code, we recommend compressing them into a tarball (.tar.gz) and transferring that.

A. Compile Matlab Code

Once you've done Steps 2 and 3 of the interactive job guide, and the interactive job has started, you can compile your code. In this example, foo.m represents the name of the primary Matlab script; you should replace foo.m with the name of your own primary script. Note that if your main script references other .m files, they will all be compiled together with the main script into one binary.

Choose one of the following compile commands, based on the version of Matlab you'd like to use:

[alice@build]$ /usr/local/MATLAB/R2015b/bin/mcc -m -R -singleCompThread -R -nodisplay -R -nojvm foo.m
[alice@build]$ /usr/local/MATLAB/R2014b/bin/mcc -m -R -singleCompThread -R -nodisplay -R -nojvm foo.m
[alice@build]$ /usr/local/MATLAB/R2013b/bin/mcc -m -R -singleCompThread -R -nodisplay -R -nojvm foo.m

There are other options for the mcc Matlab compiler. If you have questions about your particular code, contact a facilitator or see the Matlab documentation.

B. Modifying the Executable

The mcc command should have created a script called run_*.sh (where * is the name of your Matlab script; our example uses the name foo). This run_*.sh script will be the executable for your Matlab jobs and already has almost all the necessary commands for running your Matlab code. You'll need to add one line at the beginning of the run_*.sh script that unpacks the Matlab runtime. We'll also add some extra options to ensure Matlab runs smoothly on any Linux system.

The command that needs to be added at the start of this script looks like this (replace r2015b.tar.gz with the appropriate version of Matlab, if you used something different to compile):

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MATLAB Runtime environment for the current $ARCH and executes 
# the specified command.

# Add these lines to run_foo.sh
tar xzf r2015b.tar.gz
mkdir cache
export MCR_CACHE_ROOT=`pwd`/cache

# Rest of script follows

Type exit after you have compiled your code (step A) and edited the executable script (B). Condor will transfer your compiled code and its scripts back automatically.

Back on the submit node, you should now have the following files:

[alice@submit]$ ls -l 

-rw-rw-r-- 1 user user 581724 Feb 19 14:21 mccExcludedFiles.log
-rwxrw-r-- 1 user user  94858 Feb 19 14:21 foo
-rw-rw-r-- 1 user user   3092 Feb 19 14:21 readme.txt
-rw-rw-r-- 1 user user 581724 Feb 19 14:21 requiredMCRProducts.txt
-rwxrw-r-- 1 user user   1195 Feb 19 14:21 run_foo.sh
The file foo is the compiled Matlab binary. You will not need the mccExcludedFiles.log, requiredMCRProducts.txt or readme.txt to run your jobs.

Note that sometimes the compiled Matlab binary will lose its "executable" permissions. When that happens, they can be restored by running the following command:

[alice@submit]$ chmod +x foo

Again, where foo is the name of your own compiled binary.

2. Running Matlab Jobs

This section shows the important elements of creating a submit file for Matlab jobs. The submit file for your job will be different than the one used to compile your code. As a starting point for a submit file, see our "hello world" example: http://chtc.cs.wisc.edu/helloworld.shtml. In what follows, replace our example foo and run_foo.sh with the name of your binary and scripts.

  1. Use run_foo.sh as the executable:
    executable = run_foo.sh
  2. In order for your Matlab code to run, you will need to use a Matlab runtime package. This package is easily downloaded from CHTC's web proxy; the version must match the version you used to compile your code. Options available on our proxy include:
    • r2015b.tar.gz
    • r2014b.tar.gz
    • r2013b.tar.gz
    To send the runtime package to your jobs, list a link to the appropriate version in your transfer_input_files line, as well as your compiled binary and any necessary input files:
    transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/r2015b.tar.gz,foo,input_files
  3. Include the appropriate arguments for run_foo.sh (as described in readme.txt). This will be the name of the Matlab runtime directory and any arguments your Matlab code needs to run. The name of the Matlab directories for the different versions are as follows:
Matlab version Runtime directory name
r2015b.tar.gz v90
r2014b.tar.gz v84
r2013b.tar.gz v82

So to run a Matlab job using r2015b and no additional arguments, the arguments line should read:
arguments = v90
If you are passing additional arguments to the script, they can go after the first "runtime" argument:
arguments = v90 $(Cluster) $(Process) 

As always, test a few jobs for disk space/memory usage in order to make sure your requests for a large batch are accurate! The runtime package is large (at least 1.5 GB). Disk space and memory usage can be found in the log file after the job completes.

Ensuring Randomness

This section is only relevant for Matlab scripts that use Matlab's random number functions like rand.

Whenever Matlab is started for the first time on a new computer, the random number generator begins from the same state. When you run multiple Matlab jobs, each job is using a copy of Matlab that is being used for the first time -- thus, every job will start with the same random number generator and produce identical results.

There are different ways to ensure that each job is using different randomly generated numbers. This Mathworks page describes one way to "reset" the random number generator so that it produces different random values when Matlab runs for the first time. Deliberately choosing your own different random seed values for each job can be another way to ensure different results.