This is an example of how to run jobs on the LPC queue with condor, while making use of a Singularity container, specifically
one which has coffea
, awkward
, and uproot
installed.
I needed to also make use of a module of my own design (hepfile
, FWIW) so I also needed to make a virtual
environment to copy into my jobs. You might have your own bespoke modules/libraries so you can swap out what you need.
This example launches a simple script which opens a file with uproot (XRootD path) and exits.
We need this environment to be created inside a local Singularity instance. So fire up the instance!
singularity shell -B ${PWD}:/srv /cvmfs/unpacked.cern.ch/registry.hub.docker.com/coffeateam/coffea-dask:latest
cd /srv
That last cd
command is because I was launched into Singularity in my home directory. If I cd
to /srv
, I'm in the
directory where I launched Singularity, which is where I wanted to be.
To the virtual environment! You'll be doing this in the Singuarity environment, so I had a (myenv) Singularity>
prompt.
python -m venv --without-pip --system-site-packages myenv
source myenv/bin/activate
This is going to create a virtual environment called myenv
and will also create a directory called myenv
where any
new modules that I install will live.
Now let me install a few modules!
python -m pip install --ignore-installed h5py
python -m pip install --ignore-installed hepfile
You may not need all these modules but since this is my Gist, I'm putting in the ones I need. :)
Note that coffea
exists in the Singularity image we'll be using on the cluster so I don't need to install it here.
We want to copy over this over to our jobs, so let's create (c
) a file (f
) that is zipped (z
) as well, and let's use a
verbose (v
) option. (Note that the f
always has to be last so that the filename comes immediately after).
tar -zcvf myenv.tgz myenv
In addition to my myenv.tgz
file, I have the following .jdl
condor script, .sh
bash script (to run on the queue),
and a .py
python script. So the bash script will run on the queue and call the python script.
I also create a logs
subdirectory here, because I've coded it up that way in the .jdl
file.
run_singularity_bespoke_demo.jdl
singularity_bespoke_demo.sh
singularity_bespoke_demo.py
logs/
All these files are provided in the Gist.
Note that for this example, I'm hardcoding a path to a ROOT file with the appropriate XRood pre-pending.
Now I submit the .jdl
script from a non-Singularity environment.
condor_submit run_singularity_bespoke_demo.jdl
If this runs correctly, it will import coffea
(from Singularity) and hepfile
(from myenv
). It then uses coffea
to open the ROOT file and extract the number of events, which it then prints to stdout
before exiting.
You can see the output in your logs
file.
Hi,
Thank you for writing this gist. It is very detailed and informative.
I am trying to add python packages to my coffea environment similarly to how this gist works. As a first step, I am trying to run the example code you have in the gist exactly based on your instructions, and I am running into an issue.
When your bash script is running on condor, it gives me this error message:
After some debugging, it looks like the
source myenv/bin/activate
doesn't change which python I am using. Have you seen a similar issue when setting this up? Do you have an idea of what I'm missing?Any help you can give will be much appreciated.
Best,
Keane