Skip to content

Instantly share code, notes, and snippets.

@GeorgySk
Created December 13, 2023 10:34
Show Gist options
  • Save GeorgySk/f173727fb9478903943164f36423d81a to your computer and use it in GitHub Desktop.
Save GeorgySk/f173727fb9478903943164f36423d81a to your computer and use it in GitHub Desktop.
How to run a parallel pygwb pipeline on Reana cluster with ET MDC data

How to run a parallel pygwb pipeline on Reana cluster with ET MDC data

Setup

For setting up the certificates and the accounts, see How to run a serial pygwb pipeline on Reana cluster with ET MDC data.

Environment variables

Open the attached .env file and fill in the required data:

  • ESCAPE_USERNAME -- the username which was used to create the account at https://iam-escape.cloud.cnaf.infn.it/.
  • CERTIFICATES_PATH -- absolute path to the directory containing the usercert.pem and userkey.pem files.
  • REANA_CONFIG_PATH -- absolute path to the reana.yaml file (attached).
  • PYGWB_PARAMETERS_PATH -- absolute path to the parameters.ini file used by pygwb_pipe and pygwb_combine (attached).
  • SNAKEFILE_PATH -- absolute path to the Snakefile (attached).
  • SNAKEMAKE_CONFIG_PATH -- absolute path to the config.yaml file used by snakemake (attached).
  • REANA_ACCESS_TOKEN -- Reana access token which can be found in your Reana profile: https://reana-vre.cern.ch/signin.
  • WORKFLOW_NAME -- any name for a workflow; will be displayed on the Reana dashboard: https://reana-vre.cern.ch/.

Execution

  1. cd to the directory with the docker-compose.yml and .env files (both files are attached).
  2. Run docker-compose up. Docker will now:
    • pull the image with the Rucio client (only the first time);
    • launch a container from this image;
    • use the provided certificates and the token to authenticate with Rucio and Reana;
    • create the workflow with the provided name the status of which can be monitored via the Reana dashboard -- https://reana-vre.cern.ch/;
    • upload reana.yaml that specifies the steps of the workflow;
    • upload Snakefile that specifies the workflow, its jobs, and the corresponding scripts;
    • upload config.yaml that specifies the workflow parameters used by snakemake such as the channels and the time limits;
    • upload parameters.ini that specifies the parameters of the data analysis run by pygwb;
    • start the workflow;
    • print the status of the workflow in the terminal every 2 seconds.

Right now, the status will get printed forever so you would have to forcefully shut down the container by pressing CtrlC+C several times. I might change this behaviour later.
If everything was done correctly, the workflow should finish correctly in some time and you would see something like this in the Reana dashboard:
image

Selecting MDC data

To select data use the config.yaml file. The fields first_timestamp, last_timestamp, and channels define the labels of the files downloaded from Rucio. For example, for the values such as

first_timestamp: 1002641920
last_timestamp: 1002643968
channels: ['E1', 'E2']

the following files will be selected:

  • ET_OSB_MDC1:E-E1_STRAIN_DATA-1002641920-2048.gwf
  • ET_OSB_MDC1:E-E2_STRAIN_DATA-1002641920-2048.gwf
  • ET_OSB_MDC1:E-E1_STRAIN_DATA-1002643968-2048.gwf
  • ET_OSB_MDC1:E-E2_STRAIN_DATA-1002643968-2048.gwf

You might want to see which data is present on Rucio. For this, the easiest way is to use the JupyterHub interface -- https://jhub-vre.cern.ch/. After signing in with the ESCAPE credentials and starting the default environment, run

rucio list-dids --filter 'type=file' ET_OSB_MDC1:*_STRAIN_DATA-*

to get the list of the files. Note that these files are not in sync with the ones found on the HTTP web server -- http://et-origin.cism.ucl.ac.be/.


Links

ESCAPE_USERNAME=<your_escape_username>
CERTIFICATES_PATH=<absolute_path_to_directory_containing_usercert.pem_and_userkey.pem>
REANA_CONFIG_PATH=<absolute_path_to_reana.yaml>
PYGWB_PARAMETERS_PATH=<absolute_path_to_parameters.ini>
SNAKEFILE_PATH=<absolute_path_to_Snakefile>
SNAKEMAKE_CONFIG_PATH=<absolute_path_to_config.yaml>
REANA_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXX
WORKFLOW_NAME=<workflow_name>
first_timestamp: 1002641920
last_timestamp: 1002660352
channels: ['E1', 'E2']
alpha: 0.666666
fref: null # reference frequency passed to `pygwb_combine`
duration: 2048
output_path: 'output' # for `pygwb_combine`
parameters_path: 'parameters.ini'
scope: 'ET_OSB_MDC1'
pygwb_container: 'docker://docker.io/georgysk/pygwb'
rucio_client_container: 'docker://ghcr.io/vre-hub/vre-rucio-client:latest'
final_estimate_template: "point_estimate_sigma_spectra_alpha_{alpha:.1f}\
_fref_{fref}_{first_timestamp}-{last_timestamp}.npz"
estimate_file_template: 'point_estimate_sigma_{t0}-{tf}.npz'
gwf_template: 'E-{{channel}}_STRAIN_DATA-{{t0}}-{duration}.gwf'
# For `pygwb_pipe`
apply_dsc: true
pickle_out: false
wipe_ifo: true
calc_pt_est: true
pygwb_combine_template: >-
pygwb_combine
--param_file {parameters_path}
--alpha {alpha}
--data_path .
--out_path {output_path}
pygwb_pipe_template: >-
pygwb_pipe
--apply_dsc {apply_dsc}
--param_file {parameters_path}
--pickle_out {pickle_out}
--wipe_ifo {wipe_ifo}
--calc_pt_est {calc_pt_est}
--t0 {{t0}}
--tf {{tf}}
--interferometer_list {channels[0]} {channels[1]}
--local_data_path
{channels[0]}:{{gwf_paths[0]}},{channels[1]}:{{gwf_paths[1]}}
version: '3'
services:
rucio-client:
image: ghcr.io/vre-hub/vre-rucio-client
container_name: rucio-client
user: root
environment:
- RUCIO_CFG_CLIENT_X509_PROXY=/tmp/x509up
- RUCIO_CFG_AUTH_TYPE=x509_proxy
volumes:
- ${CERTIFICATES_PATH}/usercert.pem:/opt/rucio/etc/client.crt
- ${CERTIFICATES_PATH}/userkey.pem:/opt/rucio/etc/client.key
- ${REANA_CONFIG_PATH}:/home/user/reana.yaml
- ${PYGWB_PARAMETERS_PATH}:/home/user/parameters.ini
- ${SNAKEFILE_PATH}:/home/user/Snakefile
- ${SNAKEMAKE_CONFIG_PATH}:/home/user/config.yaml
entrypoint: ["/bin/sh","-c"]
command:
- |
voms-proxy-init --voms escape --cert /opt/rucio/etc/client.crt --key /opt/rucio/etc/client.key --out /tmp/x509up
export REANA_SERVER_URL=https://reana-vre.cern.ch
export REANA_ACCESS_TOKEN=${REANA_ACCESS_TOKEN}
reana-client secrets-add --env VONAME=escape \
--env VOMSPROXY_FILE=x509up \
--file /tmp/x509up \
--env RUCIO_USERNAME=${ESCAPE_USERNAME} \
--env RUCIO_RUCIO_HOST=https://vre-rucio.cern.ch \
--env RUCIO_AUTH_HOST=https://vre-rucio-auth.cern.ch \
--overwrite
reana-client create -w ${WORKFLOW_NAME}
export REANA_WORKON=${WORKFLOW_NAME}
reana-client upload
reana-client start
while true; do reana-client status; sleep 2; done
stdin_open: true
tty: true
[data_specs]
interferometer_list =
data_type = local
channel = STRAIN
time_shift = 0
[preprocessing]
new_sample_rate = 1024
cutoff_frequency = 1.0
segment_duration = 192
number_cropped_seconds = 2
window_downsampling = hamming
ftype = fir
[gating]
gate_data = False
gate_whiten = False
gate_tzero = 1.0
gate_tpad = 0.5
gate_threshold = 50.0
cluster_window = 0.5
[window_fft_specs]
window_fftgram = hann
[window_fft_welch_specs]
window_fftgram = hann
[density_estimation]
frequency_resolution = 0.25
N_average_segments_welch_psd = 2
coarse_grain_psd = False
coarse_grain_csd = True
overlap_factor_welch = 0.5
overlap_factor = 0.5
[postprocessing]
polarization = tensor
alpha = 0.0
fref = 10.0
flow = 5.0
fhigh = 500.0
[data_quality]
notch_list_path =
calibration_epsilon = 0.0
alphas_delta_sigma_cut = ['-5', '0', '3']
delta_sigma_cut = 0.2
return_naive_and_averaged_sigmas = False
[local_data]
local_data_path =
[output]
save_data_type = npz
version: 0.8.0
inputs:
files:
- parameters.ini
- Snakefile
- config.yaml
workflow:
type: snakemake
file: Snakefile
from pathlib import Path
configfile: "config.yaml"
all_start_times = range(config['first_timestamp'],
config['last_timestamp'] + 1,
config['duration'])
all_end_times = range(all_start_times.start + all_start_times.step,
all_start_times.stop + all_start_times.step,
all_start_times.step)
output_path = Path(config['output_path'])
final_estimate_filename = config['final_estimate_template'].format(**config)
final_estimate_filepath = output_path / final_estimate_filename
gwf_template = config['gwf_template'].format(duration=config['duration'])
gwf_path_template = Path(config['scope']) / gwf_template
rucio_gwf_path_template = f"{config['scope']}:{gwf_template}"
rucio_get_template = f"rucio get {rucio_gwf_path_template}"
rule all:
input: final_estimate_filepath
rule pygwb_combine:
input:
expand(config['estimate_file_template'],
zip,
t0=all_start_times,
tf=all_end_times)
output: final_estimate_filepath
container: config['pygwb_container']
shell: config['pygwb_combine_template'].format(**config)
rule run_pygwb:
input:
expand(gwf_path_template,
channel=config['channels'],
allow_missing=True)
output: temp(config['estimate_file_template'])
container: config['pygwb_container']
threads: workflow.cores
shell:
config['pygwb_pipe_template'].format(**config).format(
t0='{wildcards.t0}',
tf='{wildcards.tf}',
gwf_paths=['{input[0]}', '{input[1]}'])
rule download_data:
output: temp(gwf_path_template)
container: config['rucio_client_container']
resources:
voms_proxy=True,
rucio=True
threads: workflow.cores
shell:
rucio_get_template.format(channel='{wildcards.channel}',
t0='{wildcards.t0}')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment