Skip to content

Instantly share code, notes, and snippets.

@GeorgySk
Last active November 27, 2023 14:53
Show Gist options
  • Save GeorgySk/5cd862712d5e4d3d3cdb24275b02c877 to your computer and use it in GitHub Desktop.
Save GeorgySk/5cd862712d5e4d3d3cdb24275b02c877 to your computer and use it in GitHub Desktop.
How to run a serial pygwb pipeline on Reana cluster with ET MDC data

How to run a serial pygwb pipeline on Reana cluster with ET MDC data

Setup

X.509 certificate

The following describes the procedure for getting the certificate for those who have a UB account, but the process should be similar for other Spanish institutions.

  1. Go to sectigo SAML portal
  2. Find the "Universitat de Barcelona" and click on it.
  3. Identify yourself with the UB credentials.
  4. In the "Select your Certificate Profile to enable your enrollment options." field choose either "GÉANT Personal Authentication" or "GÉANT Personal Automated Authentication".
  5. In the "Enrollment Method" choose "Key Generation".
  6. Fill the rest of the fields freely and submit the request. The certificate file should be downloaded promptly.
  7. Split the certificate into a separate key and certificate files by running:
    openssl pkcs12 -in <your_cert>.p12 -clcerts -nokeys -out usercert.pem
    openssl pkcs12 -in <your_cert>.p12 -nocerts -nodes -out userkey.pem
    chmod 644 usercert.pem
    chmod 400 userkey.pem

ESCAPE account

  1. Apply for an account here: https://iam-escape.cloud.cnaf.infn.it/login. It can take several days to be approved and a Zoom call might be requested to understand the reasons of the request.

  2. When the account is approved, request to join "escape" group via the dashboard at https://iam-escape.cloud.cnaf.infn.it/dashboard#!/home by clicking the "Join a group" button.

    image

    When the request is approved, the "escape" group will be added in the "Groups" section:

    image

  3. Add your X.509 certificate in the "X.509 certificates" section. For this, the certificate obtained in the X.509 certificate section must be uploaded to the browser. The procedure depends on the browser used. It also might be necessary to restart the browser and log off from the ESCAPE page. A pop-up browser window should appear when entering the ESCAPE page again or logging back which will prompt to choose the uploaded certificate. If done correctly, a button should appear in the ESCAPE dashboard for linking the X.509 certificate.

Environment variables

Open the attached .env file and fill in the required data:

  • ESCAPE_USERNAME -- the username which was used to create the account at https://iam-escape.cloud.cnaf.infn.it/.
  • CERTIFICATES_PATH -- absolute path to the directory containing the usercert.pem and userkey.pem files.
  • REANA_CONFIG_PATH -- absolute path to the reana.yaml file (attached).
  • PYGWB_PARAMETERS_PATH -- absolute path to the parameters.ini file (attached).
  • REANA_ACCESS_TOKEN -- Reana access token which can be found in your Reana profile: https://reana-vre.cern.ch/signin.
  • WORKFLOW_NAME -- any name for a workflow; will be displayed on the Reana dashboard: https://reana-vre.cern.ch/.

Execution

  1. cd to the directory with the docker-compose.yml and .env files (both files are attached).
  2. Run docker-compose up.
    Docker will now:
  • pull the image with the Rucio client (only the first time);
  • launch a container from this image;
  • use the provided certificates and the token to authenticate with Rucio and Reana;
  • create the workflow with the provided name the status of which can be monitored via the Reana dashboard -- https://reana-vre.cern.ch/;
  • upload reana.yaml that specifies the steps of the workflow;
  • upload parameters.ini that specifies the parameters of the data analysis run by pygwb;
  • start the workflow;
  • print the status of the workflow in the terminal every 2 seconds.

Right now, the status will get printed forever so you would have to forcefully shut down the container by pressing CtrlC+C several times. I might change this behaviour later.
If everything was done correctly, the workflow should finish correctly in some time and you would see something like this in the Reana dashboard:
image

Selecting MDC data

To select data use the reana.yaml file. In the given example, four files are downloaded from Rucio -- two per each channel, E1 and E2:

  • ET_OSB_MDC1:E-E1_STRAIN_DATA-1002641920-2048.gwf
  • ET_OSB_MDC1:E-E2_STRAIN_DATA-1002641920-2048.gwf
  • ET_OSB_MDC1:E-E1_STRAIN_DATA-1002643968-2048.gwf
  • ET_OSB_MDC1:E-E2_STRAIN_DATA-1002643968-2048.gwf

You might want to see which data is present on Rucio. For this, the easiest way is to use the JupyterHub interface -- https://jhub-vre.cern.ch/. After signing in with the ESCAPE credentials and starting the default environment, run

rucio list-dids --filter 'type=file' ET_OSB_MDC1:*_STRAIN_DATA-*

to get the list of the files. Note that these files are not in sync with the ones found on the HTTP web server -- http://et-origin.cism.ucl.ac.be/.

When changing the input data, it is also necessary to update the parameters.ini and reana.yaml files.
In parameters.ini:

  • t0 should be updated with the initial time -- the value found in the name of the .gwf file -- similar to 1002641920 in the given example;
  • tf should be changed with the maximum time plus 2048;
  • interferometer_list should be updated with the values for the given channels -- E0, E1, E2, or E3;
  • local_data_path at the end of the file should also be updated with the corresponding values.

In reana.yaml, the following section should be updated with the values corresponding to the selected channels:

- lal_cache ET_OSB_MDC1/E-E1_*.gwf > E1.lcf
- lal_cache ET_OSB_MDC1/E-E2_*.gwf > E2.lcf

Image selection

In the given example, Reana sets up the environment defined by this image: georgysk/pygwb. There is another image that can be used instead: atanasi/pygwb. To change it, update the environment value in reana.yaml. Note, however, that the latter image uses pygwb of version 1.0.0 which uses another syntax for defining local_data_path in parameters.ini. Instead of

local_data_path = E1:E1.lcf,E2:E2.lcf

it uses

local_data_path = {E1:E1.lcf E2:E2.lcf}

Links

ESCAPE_USERNAME=<your_escape_username>
CERTIFICATES_PATH=<absolute_path_to_directory_containing_usercert.pem_and_userkey.pem>
REANA_CONFIG_PATH=<absolute_path_to_reana.yaml>
PYGWB_PARAMETERS_PATH=<absolute_path_to_parameters.ini>
REANA_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXX
WORKFLOW_NAME=<workflow_name>
version: '3'
services:
rucio-client:
image: ghcr.io/vre-hub/vre-rucio-client
container_name: rucio-client
user: root
environment:
- RUCIO_CFG_CLIENT_X509_PROXY=/tmp/x509up
- RUCIO_CFG_AUTH_TYPE=x509_proxy
volumes:
- ${CERTIFICATES_PATH}/usercert.pem:/opt/rucio/etc/client.crt
- ${CERTIFICATES_PATH}/userkey.pem:/opt/rucio/etc/client.key
- ${REANA_CONFIG_PATH}:/home/user/reana.yaml
- ${PYGWB_PARAMETERS_PATH}:/home/user/parameters.ini
entrypoint: ["/bin/sh","-c"]
command:
- |
voms-proxy-init --voms escape --cert /opt/rucio/etc/client.crt --key /opt/rucio/etc/client.key --out /tmp/x509up
export REANA_SERVER_URL=https://reana-vre.cern.ch
export REANA_ACCESS_TOKEN=${REANA_ACCESS_TOKEN}
reana-client secrets-add --env VONAME=escape \
--env VOMSPROXY_FILE=x509up \
--file /tmp/x509up \
--env RUCIO_USERNAME=${ESCAPE_USERNAME} \
--env RUCIO_RUCIO_HOST=https://vre-rucio.cern.ch \
--env RUCIO_AUTH_HOST=https://vre-rucio-auth.cern.ch \
--overwrite
reana-client create -w ${WORKFLOW_NAME}
export REANA_WORKON=${WORKFLOW_NAME}
reana-client upload
reana-client start
while true; do reana-client status; sleep 2; done
stdin_open: true
tty: true
[data_specs]
t0 = 1002641920
tf = 1002645976
interferometer_list = ["E1", "E2"]
data_type = local
channel = STRAIN
time_shift = 0
[preprocessing]
new_sample_rate = 1024
cutoff_frequency = 1.0
segment_duration = 192
number_cropped_seconds = 2
window_downsampling = hamming
ftype = fir
[gating]
gate_data = False
gate_whiten = False
gate_tzero = 1.0
gate_tpad = 0.5
gate_threshold = 50.0
cluster_window = 0.5
[window_fft_specs]
window_fftgram = hann
[window_fft_welch_specs]
window_fftgram = hann
[density_estimation]
frequency_resolution = 0.25
N_average_segments_welch_psd = 2
coarse_grain_psd = False
coarse_grain_csd = True
overlap_factor_welch = 0.5
overlap_factor = 0.5
[postprocessing]
polarization = tensor
alpha = 0.0
fref = 10.0
flow = 5.0
fhigh = 500.0
[data_quality]
notch_list_path =
calibration_epsilon = 0.0
alphas_delta_sigma_cut = ['-5', '0', '3']
delta_sigma_cut = 0.2
return_naive_and_averaged_sigmas = False
[local_data]
local_data_path = E1:E1.lcf,E2:E2.lcf
[output]
save_data_type = npz
version: 0.6.0
inputs:
files:
- parameters.ini
workflow:
type: serial
specification:
steps:
- name: fetchdata
voms_proxy: true
rucio: true
environment: 'ghcr.io/vre-hub/vre-rucio-client:v0.1.2-1-0487cc0'
commands:
- rucio get ET_OSB_MDC1:E-E1_STRAIN_DATA-1002641920-2048.gwf
- rucio get ET_OSB_MDC1:E-E2_STRAIN_DATA-1002641920-2048.gwf
- rucio get ET_OSB_MDC1:E-E1_STRAIN_DATA-1002643968-2048.gwf
- rucio get ET_OSB_MDC1:E-E2_STRAIN_DATA-1002643968-2048.gwf
- name: fitdata
environment: georgysk/pygwb
commands:
- lal_cache ET_OSB_MDC1/E-E1_*.gwf > E1.lcf
- lal_cache ET_OSB_MDC1/E-E2_*.gwf > E2.lcf
- pygwb_pipe --apply_dsc True --param_file parameters.ini --pickle_out False --wipe_ifo True --calc_pt_est True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment