Running Matlab Jobs on CHTC
To best understand the below information, users should already have an understanding of:
- Using the command line to: navigate within directories, create/copy/move/delete files and directories, and run their intended programs (aka "executables").
- The CHTC’s Intro to Running HTCondor Jobs
Like most programs, Matlab is not installed on CHTC's high throughput
compute system. One way to run Matlab where it isn't installed is to
.m files into a binary file and run the binary by using
a set of files called the Matlab runtime. In order to run Matlab in
CHTC, it is therefore necessary to perform the following steps which
will be detailed in the guide below (click on the links to go to the
- Prepare your Matlab program
- Write a submit file that uses the compiled code and script
If your Matlab code depends on random number generation, using a
randperm, please see the section on ensuring
Supported Versions of Matlab
A. Preparing Your Matlab Program
You can compile
.m files into a Matlab binary yourself by requesting
an interactive session on one of our build machines. The session is
essentially a job without an executable; you are the one running the
commands instead (in this case, to compile the code).
1. Start an Interactive Build Job
Start by uploading all of the Matlab code files (usually
.mat files) that you need to run your code to the submit server.
If you have many of Matlab code files (more than 1-5), it's a good idea to combine them into a
.tar.gzfile (like a zip file), so that you can simply transfer the single
.tar.gzfile for compiling the code. You can create a tar file by running this command:
tar -czf code.tar.gz files and folders
Create the following special submit file on the submit server, calling
it something like
# Matlab build file universe = vanilla log = interactive.log # List all of your .m files, or a tar.gz file if you've combined them. transfer_input_files = script.m, functions.tar.gz +IsBuildJob = true requirements = (OpSysMajorVer =?= 8) request_cpus = 1 request_memory = 4GB request_disk = 2GB queue
Fill in the "transfer_input_files" line with your Matlab .m files, or a tar.gz file with all of the Matlab files your code uses.
Once this submit file is created, you will start the interactive job by running the following command:
[alice@submit]$ condor_submit -i build.sub
It may take a few minutes for the build job to start.
2. Compile Matlab Code and Exit Interactive Job
Once the interactive job has started, you can compile your code. In this
script.m represents the name of the primary Matlab script;
you should replace
script.m with the name of your own primary script.
Note that if your main script references other
.m files, as long as
they are present in the working directory, they will all be compiled
together with the main script into one binary.
If you combined your Matlab
.mfiles into one
.tar.gzfile, make sure to "un-tar" that file before running the compiling steps below.
To access the Matlab compiler on the build node, you'll need to load a the appropriate Matlab module. For Matlab 2015b, the module load command will look like this:
[alice@build]$ module load MATLAB/R2015b
If you want to use a different version of Matlab, change the name after the
load command. Once the
module is loaded, run the compilation command:
[alice@build]$ mcc -m -R -singleCompThread -R -nodisplay -R -nojvm script.m
There are other options for the
mccMatlab compiler that might be necessary for specific compiling situations. For example, if your main .m script uses a set of Matlab functions or .m files that are contained in a subdirectory (called, say,
functions), then your compiling command will need to use the
-aflag at the end of the command like so:
[alice@build]$ mcc -m \ -R -singleCompThread -R -nodisplay -R -nojvm \ script.m -a functions/
(The backslashes, \, are there just to break up the full command.)
Exit the interactive session after you have compiled your code:
Condor will transfer your compiled code and its scripts back automatically.
Back on the submit node, you should now have the following files:
[alice@submit]$ ls -l -rw-rw-r-- 1 user user 581724 Feb 19 14:21 mccExcludedFiles.log -rwxrw-r-- 1 user user 94858 Feb 19 14:21 script -rwxrw-r-- 1 user user 1024 Feb 19 14:00 script.m -rw-rw-r-- 1 user user 3092 Feb 19 14:21 readme.txt -rw-rw-r-- 1 user user 581724 Feb 19 14:21 requiredMCRProducts.txt -rwxrw-r-- 1 user user 1195 Feb 19 14:21 run_script.sh
script is the compiled Matlab binary. You will not need the
readme.txt to run
3. Modifying the Executable
mcc command should have created a script called
* is the name of your Matlab script; our example uses the name
run_*.sh script will be the executable for your Matlab
jobs and already has almost all the necessary commands for running your
You'll need to add one line at the beginning of the
that unpacks the Matlab runtime. We'll also add some extra options to
ensure Matlab runs smoothly on any Linux system.
The commands that need to be added, and their location looks like this
r2015b.tar.gz with the appropriate version of Matlab, if
you used something different to compile):
#!/bin/sh # script for execution of deployed applications # # Sets up the MATLAB Runtime environment for the current $ARCH and executes # the specified command. # Add these lines to run_script.sh tar -xzf r2015b.tar.gz mkdir cache export MCR_CACHE_ROOT=$PWD/cache # Rest of script follows
B. Running Matlab Jobs
This section shows the important elements of creating a submit file for
Matlab jobs. The submit file for your job will be different than the one
used to compile your code. As a starting point for a submit file, see
our "hello world" example:
http://chtc.cs.wisc.edu/helloworld. In what
follows, replace our example
run_script.sh with the name
of your binary and scripts.
run_script.shas the executable:
executable = run_script.sh
In order for your Matlab code to run, you will need to use a Matlab runtime package. This package is easily downloaded from CHTC's web proxy; the version must match the version you used to compile your code. Options available on our proxy include:
To send the runtime package to your jobs, list a link to the appropriate version in your
transfer_input_filesline, as well as your compiled binary and any necessary input files:
transfer_input_files = http://proxy.chtc.wisc.edu/SQUID/r2015b.tar.gz,script,input_data
run_script.shwill expect the runtime directory name to be provided as an argument specified in the submit file (as described in
Matlab version Runtime directory name
So to run a Matlab job using
r2015band no additional arguments, the arguments line in the submit file should read:
arguments = v90
If you are passing additional arguments to the script, they can go after the first "runtime" argument:
arguments = v90 $(Cluster) $(Process)
If you are passing numerical values as arguments to your Matlab binary, you will need to revise your Matlab code so that the values are interpreted as numbers instead of as characters (the default). To do this, you can use that Matlab
str2numfunction, more information is available at Matlab Str2num.
- As always, test a few jobs for disk space/memory usage in order to make sure your requests for a large batch are accurate! Disk space and memory usage can be found in the log file after the job completes. If you are using Matlab 2018b, request at least 5.5GB of DISK as the runtime is very large for this version of Matlab.
This section is only relevant for Matlab scripts that use Matlab's
random number functions like
Whenever Matlab is started for the first time on a new computer, the random number generator begins from the same state. When you run multiple Matlab jobs, each job is using a copy of Matlab that is being used for the first time -- thus, every job will start with the same random number generator and produce identical results.
There are different ways to ensure that each job is using different randomly generated numbers. This Mathworks page describes one way to "reset" the random number generator so that it produces different random values when Matlab runs for the first time. Deliberately choosing your own different random seed values for each job can be another way to ensure different results.