Create a Portable Python Installation with Miniconda
Quickstart: Conda
Option A (recommended)
Build a container with Conda packages installed inside:
- How to build your own container
- Example container recipes for Conda
- Use your container in your HTC jobs
Option B
Create your own portable copy of your Conda packages:
- Follow the instructions in our guide
This approach may be sensitive to the operating system of the execution point. We recommend building a container instead, but are keeping these instructions as a backup.
More information
The above instructions are intended for if you have package(s) that need to be installed using conda install
.
Miniconda can be used to install Python and R and corresponding packages.
But if you only need to install Python or R, and do not otherwise need to use a conda install
command to set up the packages,
you should see the instructions specifically for setting up Python or R because there is less chance of obscure errors when building your container.
When building or using a Miniconda container, you do not need to create or activate a conda environment.
For the build process, you skip directly to the conda install
commands you want to run.
Similarly, when executing a script in a Miniconda container, the packages are loaded when the container starts.
Executable
If you are planning to execute a python .py
script using your Miniconda container, you can follow the instructions in the Python guide.
If you are planning to execute a .R
script using your Miniconda container, you can follow the instructions in the R guide.
Otherwise, you can use a bash .sh
script as the submit file executable
:
#!/bin/bash
<your commands go here>
where the contents of the file are the commands that you want to execute using your conda environment. You do not and should not try to activate the conda environment in the executable if you are using a container.
Specifying Exact Dependency Versions
An important part of improving reproducibility and consistency between runs is to ensure that you use the correct/expected versions of your dependencies.
When you run a command like conda install numpy
, conda
tries to install
the most recent version of numpy
. For example, numpy
version 1.18.2
was released on March 17, 2020. To install exactly this version of numpy
, you
would run conda install numpy=1.18.2
(the same works for pip
, if you replace =
with ==
). We
recommend installing with an explicit version to make sure you have exactly
the version of a package that you want. This is often called
“pinning” or “locking” the version of the package.
If you want a record of what is installed in your environment, or want to
reproduce your environment on another computer, conda can create a file, usually
called environment.yml
, that describes the exact versions of all of the
packages you have installed in an environment.
This file can be re-used by a different conda command to recreate that
exact environment on another computer.
To create an environment.yml
file from your currently-activated environment, run
[alice@submit]$ conda env export > environment.yml
This environment.yml
will pin the exact version of every dependency in your
environment. This can sometimes be problematic if you are moving between
platforms because a package version may not be available on some other platform,
causing an “unsatisfiable dependency” or “inconsistent environment” error.
A much less strict pinning is
[alice@submit]$ conda env export --from-history > environment.yml
which only lists packages that you installed manually, and does not pin their
versions unless you yourself pinned them during installation.
If you need an intermediate solution, it is also possible to manually edit
environment.yml
files; see the
conda environment documentation
for more details about the format and what is possible.
In general, exact environment specifications are simply not guaranteed to be
transferable between platforms (e.g., between Windows and Linux).
We strongly recommend using the strictest possible pinning available to you.
To create an environment from an environment.yml
file, run
[alice@submit]$ conda env create -f environment.yml
By default, the name of the environment will be whatever the name of the source
environment was; you can change the name by adding a -n <name>
option to the
conda env create
command.
If you use a source control system like git
, we recommend checking your
environment.yml
file into source control and making sure to recreate it
when you make changes to your environment.
Putting your environment under source control gives you a way to track how it
changes along with your own code.
If you are developing software on your local computer for eventual use on the CHTC pool, your workflow might look like this:
- Set up a conda environment for local development and install packages as desired
(e.g.,
conda create -n science; conda activate science; conda install numpy
). - Once you are ready to run on the CHTC pool, create an
environment.yml
file from your local environment (e.g.,conda env export > environment.yml
). - Move your
environment.yml
file from your local computer to the submit machine and create an environment from it (e.g.,conda env create -f environment.yml
), then pack it for use in your jobs, as per Create Software Package.
More information on conda environments can be found in their documentation.
Option B: Create your own portable copy
1. Create a Miniconda installation
On the submit server, download the latest Linux miniconda installer and run it.
[alice@submit]$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
[alice@submit]$ sh Miniconda3-latest-Linux-x86_64.sh
Accept the license agreement and default options. At the end, you can choose whether or not to “initialize Miniconda3 by running conda init?” We recommend that you enter “yes”. Once you’ve completed the installer, you’ll be prompted to restart your terminal. Log out and log back in, and conda will be ready to use to set up your software.
If you choose “no” you’ll want to save the
eval
command shown by the installer so that you can reactivate the Miniconda installation when needed in the future.
2. Create a conda “environment” with your software
(If you are using an
environment.yml
file as described later, you should instead create the environment from yourenvironment.yml
file. If you don’t have anenvironment.yml
file to work with, follow the install instructions in this section. We recommend switching to theenvironment.yml
method of creating environments once you understand the “manual” method presented here.)
Make sure that you’ve activated the base Miniconda environment if you haven’t already. Your prompt should look like this:
(base)[alice@submit]$
To create an environment, use the conda create
command and then activate the
environment:
(base)[alice@submit]$ conda create -n env-name
(base)[alice@submit]$ conda activate env-name
Then, run the conda install
command to install the different packages and
software you want to include in the installation. How this should look is often
listed in the installation examples for software
(e.g. Qiime2,
Pytorch).
(env-name)[alice@submit]$ conda install pkg1 pkg2
Some Conda packages are only available via specific Conda channels which serve as repositories for hosting and managing packages. If Conda is unable to locate the requested packages using the example above, you may need to have Conda search other channels. More detail are available at https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html.
Packages may also be installed via pip
, but you should only do this
when there is no conda
package available.
Once everything is installed, deactivate the environment to go back to the Miniconda “base” environment.
(env-name)[alice@submit]$ conda deactivate
For example, if you wanted to create an installation with pandas
and
matplotlib
and call the environment py-data-sci
, you would use this sequence
of commands:
(base)[alice@submit]$ conda create -n py-data-sci
(base)[alice@submit]$ conda activate py-data-sci
(py-data-sci)[alice@submit]$ conda install pandas matplotlib
(py-data-sci)[alice@submit]$ conda deactivate
(base)[alice@submit]$
More about Miniconda
See the official conda documentation for more information on creating and managing environments with
conda
.
3. Create Software Package
Make sure that your job’s Miniconda environment is created, but deactivated, so that you’re in the “base” Miniconda environment:
(base)[alice@submit]$
Then, run this command to install the conda pack
tool:
(base)[alice@submit]$ conda install -c conda-forge conda-pack
Enter y
when it asks you to install.
Finally, use conda pack
to create a zipped tar.gz file of your environment
(substitute the name of your conda environment where you see env-name
),
set the proper permissions for this file using chmod
, and check the size of
the final tarball:
(base)[alice@submit]$ conda pack -n env-name --dest-prefix='$ENVDIR'
(base)[alice@submit]$ chmod 644 env-name.tar.gz
(base)[alice@submit]$ ls -sh env-name.tar.gz
When this step finishes, you should see a file in your current directory named
env-name.tar.gz
4. Check Size of Conda Environment Tar Archive
The tar archive, env-name.tar.gz
, created in the previous step will be used as input for
subsequent job submission. As with all job input files, you should check the size of this
Conda environment file. If >100MB in size, you should NOT transfer the tar ball using
transfer_input_files
. Instead, you should plan to use either CHTC’s web proxy, SQUID or
large data filesystem Staging. Please contact a research computing facilitators at
chtc@cs.wisc.edu to determine the best option for your jobs.
More information is available at File Availability with Squid Web Proxy and Managing Large Data in HTC Jobs.
5. Create a Job Executable
The job will need to go through a few steps to use this “packed” conda environment;
first, setting the PATH
, then unzipping the environment, then activating it,
and finally running whatever program you like. The script below is an example
of what is needed (customize as indicated to match your choices above).
#!/bin/bash
# have job exit if any command returns with non-zero exit status (aka failure)
set -e
# replace env-name on the right hand side of this line with the name of your conda environment
ENVNAME=env-name
# if you need the environment directory to be named something other than the environment name, change this line
export ENVDIR=$ENVNAME
# these lines handle setting up the environment; you shouldn't have to modify them
export PATH
mkdir $ENVDIR
tar -xzf $ENVNAME.tar.gz -C $ENVDIR
. $ENVDIR/bin/activate
# modify this line to run your desired Python script and any other work you need to do
python3 hello.py
6. Submit Jobs
In your submit file, make sure to have the following:
- Your executable should be the the bash script you created in step 5.
- Remember to transfer your Python script and the environment
tar.gz
file viatransfer_input_files
. Since thetar.gz
file will almost certainly be larger than 100MB, please email us about different tools for delivering the installation to your jobs, likely our SQUID web proxy.