Skip to content

Instantly share code, notes, and snippets.

@mattbellis
Last active January 26, 2022 04:38
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mattbellis/20b9f892689c8a32b99151c5aa7a4e5f to your computer and use it in GitHub Desktop.
Save mattbellis/20b9f892689c8a32b99151c5aa7a4e5f to your computer and use it in GitHub Desktop.
Running jobs on Condor at the LPC with Singularity and coffea or other bespoke libraries

This is an example of how to run jobs on the LPC queue with condor, while making use of a Singularity container, specifically one which has coffea, awkward, and uproot installed.

I needed to also make use of a module of my own design (hepfile, FWIW) so I also needed to make a virtual environment to copy into my jobs. You might have your own bespoke modules/libraries so you can swap out what you need.

This example launches a simple script which opens a file with uproot (XRootD path) and exits.

Create a python venv

Launch Singularity

We need this environment to be created inside a local Singularity instance. So fire up the instance!

singularity shell -B ${PWD}:/srv /cvmfs/unpacked.cern.ch/registry.hub.docker.com/coffeateam/coffea-dask:latest
cd /srv

That last cd command is because I was launched into Singularity in my home directory. If I cd to /srv, I'm in the directory where I launched Singularity, which is where I wanted to be.

venv all the things!

To the virtual environment! You'll be doing this in the Singuarity environment, so I had a (myenv) Singularity> prompt.

python -m venv --without-pip --system-site-packages myenv
source myenv/bin/activate

This is going to create a virtual environment called myenv and will also create a directory called myenv where any new modules that I install will live.

Now let me install a few modules!

python -m pip install --ignore-installed h5py
python -m pip install --ignore-installed hepfile

You may not need all these modules but since this is my Gist, I'm putting in the ones I need. :)

Note that coffea exists in the Singularity image we'll be using on the cluster so I don't need to install it here.

Tar it up!

We want to copy over this over to our jobs, so let's create (c) a file (f) that is zipped (z) as well, and let's use a verbose (v) option. (Note that the f always has to be last so that the filename comes immediately after).

tar -zcvf myenv.tgz myenv

Submit the jobs!

In addition to my myenv.tgz file, I have the following .jdl condor script, .sh bash script (to run on the queue), and a .py python script. So the bash script will run on the queue and call the python script.

I also create a logs subdirectory here, because I've coded it up that way in the .jdl file.

run_singularity_bespoke_demo.jdl  
singularity_bespoke_demo.sh  
singularity_bespoke_demo.py
logs/

All these files are provided in the Gist.

Note that for this example, I'm hardcoding a path to a ROOT file with the appropriate XRood pre-pending.

Now I submit the .jdl script from a non-Singularity environment.

condor_submit run_singularity_bespoke_demo.jdl

If this runs correctly, it will import coffea (from Singularity) and hepfile (from myenv). It then uses coffea to open the ROOT file and extract the number of events, which it then prints to stdout before exiting.

You can see the output in your logs file.

universe = vanilla
Executable = singularity_bespoke_demo.sh
+SingularityImage = "/cvmfs/unpacked.cern.ch/registry.hub.docker.com/coffeateam/coffea-dask:latest"
use_x509userproxy = true
should_transfer_files = YES
WhenToTransferOutput = ON_EXIT_OR_EVICT
notification = never
Transfer_Input_Files = singularity_bespoke_demo.py, myenv.tgz
# Don't forget to make the logs directory ahead of time!
Output = logs/$(Executable)_$(cluster)_$(process).stdout
Error = logs/$(Executable)_$(cluster)_$(process).stderr
Log = logs/$(Executable)_$(cluster)_$(process).condor
Queue 1
import numpy as np
import awkward as ak
import uproot as uproot
import hepfile
import sys
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
infilename = "root://cmsxrootd.fnal.gov//store/mc/RunIISummer20UL18NanoAODv9/TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v16_L1v1-v1/130000/44187D37-0301-3942-A6F7-C723E9F4813D.root"
print("Reading in {0}".format(infilename))
events = NanoEventsFactory.from_root(infilename, schemaclass=NanoAODSchema).events()
print(f"There are {len(events)} events in the file")
print("Exiting...")
#!/usr/bin/env bash
echo "Untarring the virtual environment"
# Do this *not* verbose
tar -zxf myenv.tgz
echo
echo "Activating our virtual environment"
source myenv/bin/activate
echo
echo "Running our python example!"
python singularity_bespoke_demo.py
echo
echo "All done!"
@Keane-Tan
Copy link

Hi,

Thank you for writing this gist. It is very detailed and informative.

I am trying to add python packages to my coffea environment similarly to how this gist works. As a first step, I am trying to run the example code you have in the gist exactly based on your instructions, and I am running into an issue.

When your bash script is running on condor, it gives me this error message:

Traceback (most recent call last):
  File "singularity_bespoke_demo.py", line 5, in <module>
    import hepfile
ModuleNotFoundError: No module named 'hepfile'

After some debugging, it looks like the source myenv/bin/activate doesn't change which python I am using. Have you seen a similar issue when setting this up? Do you have an idea of what I'm missing?

Any help you can give will be much appreciated.

Best,
Keane

@mattbellis
Copy link
Author

Hi Keane, thank you for your nice comments!

Just to make sure that hepfile installed in the right location when you run

python -m pip install --ignore-installed hepfile

To make sure it's where it is supposed to be, you can check to see if it is in your myenv directory. For example, I can do

ls -ltr myenv/lib/python3.8/site-packages/

in the directory where I'm doing all this and when I do, I see the hepfile subdirectory.

Now keep in mind, hepfile might not be something you need, but if you're just running this example as-is, it should all work.

Let me know if any of this helps and if not, we can continue to try to debug the process!

Matt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment