GeorgySk/!readme.md

## !readme.md

      
    Raw
  

              !readme.md
            
          
    How to run a parallel pygwb pipeline on Reana cluster with ET MDC data

Setup

For setting up the certificates and the accounts, see How to run a serial pygwb pipeline on Reana cluster with ET MDC data.
Environment variables

Open the attached .env file and fill in the required data:

ESCAPE_USERNAME -- the username which was used to create the account at https://iam-escape.cloud.cnaf.infn.it/.
CERTIFICATES_PATH -- absolute path to the directory containing the usercert.pem and userkey.pem files.
REANA_CONFIG_PATH -- absolute path to the reana.yaml file (attached).
PYGWB_PARAMETERS_PATH -- absolute path to the parameters.ini file used by pygwb_pipe and pygwb_combine (attached).
SNAKEFILE_PATH -- absolute path to the Snakefile (attached).
SNAKEMAKE_CONFIG_PATH -- absolute path to the config.yaml file used by snakemake (attached).
REANA_ACCESS_TOKEN -- Reana access token which can be found in your Reana profile: https://reana-vre.cern.ch/signin.
WORKFLOW_NAME -- any name for a workflow; will be displayed on the Reana dashboard: https://reana-vre.cern.ch/.

Execution


cd to the directory with the docker-compose.yml and .env files (both files are attached).
Run docker-compose up. Docker will now:

pull the image with the Rucio client (only the first time);
launch a container from this image;
use the provided certificates and the token to authenticate with Rucio and Reana;
create the workflow with the provided name the status of which can be monitored via the Reana dashboard -- https://reana-vre.cern.ch/;
upload reana.yaml that specifies the steps of the workflow;
upload Snakefile that specifies the workflow, its jobs, and the corresponding scripts;
upload config.yaml that specifies the workflow parameters used by snakemake such as the channels and the time limits;
upload parameters.ini that specifies the parameters of the data analysis run by pygwb;
start the workflow;
print the status of the workflow in the terminal every 2 seconds.


Right now, the status will get printed forever so you would have to forcefully shut down the container by pressing CtrlC+C several times. I might change this behaviour later.

If everything was done correctly, the workflow should finish correctly in some time and you would see something like this in the Reana dashboard:


Selecting MDC data

To select data use the config.yaml file. The fields first_timestamp, last_timestamp, and channels define the labels of the files downloaded from Rucio. For example, for the values such as
first_timestamp: 1002641920
last_timestamp: 1002643968
channels: ['E1', 'E2']

the following files will be selected:

ET_OSB_MDC1:E-E1_STRAIN_DATA-1002641920-2048.gwf
ET_OSB_MDC1:E-E2_STRAIN_DATA-1002641920-2048.gwf
ET_OSB_MDC1:E-E1_STRAIN_DATA-1002643968-2048.gwf
ET_OSB_MDC1:E-E2_STRAIN_DATA-1002643968-2048.gwf

You might want to see which data is present on Rucio. For this, the easiest way is to use the JupyterHub interface -- https://jhub-vre.cern.ch/. After signing in with the ESCAPE credentials and starting the default environment, run
rucio list-dids --filter 'type=file' ET_OSB_MDC1:*_STRAIN_DATA-*
to get the list of the files. Note that these files are not in sync with the ones found on the HTTP web server -- http://et-origin.cism.ucl.ac.be/.

Links


MDC page on ET wiki: https://wiki.et-gw.eu/OSB/DataAnalysisPlatform/MDC
Access to ET MDC data using HTTP web server: http://et-origin.cism.ucl.ac.be/
Pygwb documentation: https://pygwb.docs.ligo.org/pygwb/index.html
Snakemake documentation: https://snakemake.readthedocs.io/
VRE page: https://vre-hub.github.io/


## .env
ESCAPE_USERNAME=<your_escape_username>
CERTIFICATES_PATH=<absolute_path_to_directory_containing_usercert.pem_and_userkey.pem>
REANA_CONFIG_PATH=<absolute_path_to_reana.yaml>
PYGWB_PARAMETERS_PATH=<absolute_path_to_parameters.ini>
SNAKEFILE_PATH=<absolute_path_to_Snakefile>
SNAKEMAKE_CONFIG_PATH=<absolute_path_to_config.yaml>
REANA_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXX
WORKFLOW_NAME=<workflow_name>

## config.yaml
first_timestamp: 1002641920
last_timestamp: 1002660352
channels: ['E1', 'E2']

alpha: 0.666666
fref: null  # reference frequency passed to `pygwb_combine`

duration: 2048
output_path: 'output'  # for `pygwb_combine`
parameters_path: 'parameters.ini'
scope: 'ET_OSB_MDC1'

pygwb_container: 'docker://docker.io/georgysk/pygwb'
rucio_client_container: 'docker://ghcr.io/vre-hub/vre-rucio-client:latest'

final_estimate_template: "point_estimate_sigma_spectra_alpha_{alpha:.1f}\
  _fref_{fref}_{first_timestamp}-{last_timestamp}.npz"
estimate_file_template: 'point_estimate_sigma_{t0}-{tf}.npz'
gwf_template: 'E-{{channel}}_STRAIN_DATA-{{t0}}-{duration}.gwf'

# For `pygwb_pipe`
apply_dsc: true
pickle_out: false
wipe_ifo: true
calc_pt_est: true

pygwb_combine_template: >-
  pygwb_combine
  --param_file {parameters_path}
  --alpha {alpha}
  --data_path .
  --out_path {output_path}

pygwb_pipe_template: >-
  pygwb_pipe
  --apply_dsc {apply_dsc}
  --param_file {parameters_path}
  --pickle_out {pickle_out}
  --wipe_ifo {wipe_ifo}
  --calc_pt_est {calc_pt_est}
  --t0 {{t0}}
  --tf {{tf}}
  --interferometer_list {channels[0]} {channels[1]}
  --local_data_path
  {channels[0]}:{{gwf_paths[0]}},{channels[1]}:{{gwf_paths[1]}}

## docker-compose.yml
version: '3'

services:
  rucio-client:
    image: ghcr.io/vre-hub/vre-rucio-client
    container_name: rucio-client
    user: root
    environment:
      - RUCIO_CFG_CLIENT_X509_PROXY=/tmp/x509up
      - RUCIO_CFG_AUTH_TYPE=x509_proxy
    volumes:
      - ${CERTIFICATES_PATH}/usercert.pem:/opt/rucio/etc/client.crt
      - ${CERTIFICATES_PATH}/userkey.pem:/opt/rucio/etc/client.key
      - ${REANA_CONFIG_PATH}:/home/user/reana.yaml
      - ${PYGWB_PARAMETERS_PATH}:/home/user/parameters.ini
      - ${SNAKEFILE_PATH}:/home/user/Snakefile
      - ${SNAKEMAKE_CONFIG_PATH}:/home/user/config.yaml
    entrypoint: ["/bin/sh","-c"]
    command:
    - |
       voms-proxy-init --voms escape --cert /opt/rucio/etc/client.crt --key /opt/rucio/etc/client.key --out /tmp/x509up
       export REANA_SERVER_URL=https://reana-vre.cern.ch
       export REANA_ACCESS_TOKEN=${REANA_ACCESS_TOKEN}
       reana-client secrets-add --env VONAME=escape \
                                --env VOMSPROXY_FILE=x509up \
                                --file /tmp/x509up \
                                --env RUCIO_USERNAME=${ESCAPE_USERNAME} \
                                --env RUCIO_RUCIO_HOST=https://vre-rucio.cern.ch \
                                --env RUCIO_AUTH_HOST=https://vre-rucio-auth.cern.ch \
                                --overwrite
       reana-client create -w ${WORKFLOW_NAME}
       export REANA_WORKON=${WORKFLOW_NAME}
       reana-client upload
       reana-client start
       while true; do reana-client status; sleep 2; done
    stdin_open: true
    tty: true

## parameters.ini
[data_specs]
interferometer_list =
data_type = local
channel = STRAIN
time_shift = 0

[preprocessing]
new_sample_rate = 1024
cutoff_frequency = 1.0
segment_duration = 192
number_cropped_seconds = 2
window_downsampling = hamming
ftype = fir

[gating]
gate_data = False
gate_whiten = False
gate_tzero = 1.0
gate_tpad = 0.5
gate_threshold = 50.0
cluster_window = 0.5

[window_fft_specs]
window_fftgram = hann

[window_fft_welch_specs]
window_fftgram = hann

[density_estimation]
frequency_resolution = 0.25
N_average_segments_welch_psd = 2
coarse_grain_psd = False
coarse_grain_csd = True
overlap_factor_welch = 0.5
overlap_factor = 0.5

[postprocessing]
polarization = tensor
alpha = 0.0
fref = 10.0
flow = 5.0
fhigh = 500.0

[data_quality]
notch_list_path =
calibration_epsilon = 0.0
alphas_delta_sigma_cut = ['-5', '0', '3']
delta_sigma_cut = 0.2
return_naive_and_averaged_sigmas = False

[local_data]
local_data_path =

[output]
save_data_type = npz

## reana.yaml
version: 0.8.0
inputs:
  files:
    - parameters.ini
    - Snakefile
    - config.yaml
workflow:
  type: snakemake
  file: Snakefile

## Snakefile
from pathlib import Path

configfile: "config.yaml"

all_start_times = range(config['first_timestamp'],
                        config['last_timestamp'] + 1,
                        config['duration'])
all_end_times = range(all_start_times.start + all_start_times.step,
                      all_start_times.stop + all_start_times.step,
                      all_start_times.step)

output_path = Path(config['output_path'])
final_estimate_filename = config['final_estimate_template'].format(**config)
final_estimate_filepath = output_path / final_estimate_filename
gwf_template = config['gwf_template'].format(duration=config['duration'])
gwf_path_template = Path(config['scope']) / gwf_template
rucio_gwf_path_template = f"{config['scope']}:{gwf_template}"
rucio_get_template = f"rucio get {rucio_gwf_path_template}"

rule all:
    input: final_estimate_filepath

rule pygwb_combine:
    input:
        expand(config['estimate_file_template'],
               zip,
               t0=all_start_times,
               tf=all_end_times)
    output: final_estimate_filepath
    container: config['pygwb_container']
    shell: config['pygwb_combine_template'].format(**config)

rule run_pygwb:
    input:
        expand(gwf_path_template,
               channel=config['channels'],
               allow_missing=True)
    output: temp(config['estimate_file_template'])
    container: config['pygwb_container']
    threads: workflow.cores
    shell:
        config['pygwb_pipe_template'].format(**config).format(
            t0='{wildcards.t0}',
            tf='{wildcards.tf}',
            gwf_paths=['{input[0]}', '{input[1]}'])

rule download_data:
    output: temp(gwf_path_template)
    container: config['rucio_client_container']
    resources:
        voms_proxy=True,
        rucio=True
    threads: workflow.cores
    shell:
        rucio_get_template.format(channel='{wildcards.channel}',
                                  t0='{wildcards.t0}')
	ESCAPE_USERNAME=<your_escape_username>
	CERTIFICATES_PATH=<absolute_path_to_directory_containing_usercert.pem_and_userkey.pem>
	REANA_CONFIG_PATH=<absolute_path_to_reana.yaml>
	PYGWB_PARAMETERS_PATH=<absolute_path_to_parameters.ini>
	SNAKEFILE_PATH=<absolute_path_to_Snakefile>
	SNAKEMAKE_CONFIG_PATH=<absolute_path_to_config.yaml>
	REANA_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXX
	WORKFLOW_NAME=<workflow_name>
	first_timestamp: 1002641920
	last_timestamp: 1002660352
	channels: ['E1', 'E2']

	alpha: 0.666666
	fref: null # reference frequency passed to `pygwb_combine`

	duration: 2048
	output_path: 'output' # for `pygwb_combine`
	parameters_path: 'parameters.ini'
	scope: 'ET_OSB_MDC1'

	pygwb_container: 'docker://docker.io/georgysk/pygwb'
	rucio_client_container: 'docker://ghcr.io/vre-hub/vre-rucio-client:latest'

	final_estimate_template: "point_estimate_sigma_spectra_alpha_{alpha:.1f}\
	_fref_{fref}_{first_timestamp}-{last_timestamp}.npz"
	estimate_file_template: 'point_estimate_sigma_{t0}-{tf}.npz'
	gwf_template: 'E-{{channel}}_STRAIN_DATA-{{t0}}-{duration}.gwf'

	# For `pygwb_pipe`
	apply_dsc: true
	pickle_out: false
	wipe_ifo: true
	calc_pt_est: true

	pygwb_combine_template: >-
	pygwb_combine
	--param_file {parameters_path}
	--alpha {alpha}
	--data_path .
	--out_path {output_path}

	pygwb_pipe_template: >-
	pygwb_pipe
	--apply_dsc {apply_dsc}
	--param_file {parameters_path}
	--pickle_out {pickle_out}
	--wipe_ifo {wipe_ifo}
	--calc_pt_est {calc_pt_est}
	--t0 {{t0}}
	--tf {{tf}}
	--interferometer_list {channels[0]} {channels[1]}
	--local_data_path
	{channels[0]}:{{gwf_paths[0]}},{channels[1]}:{{gwf_paths[1]}}
	version: '3'

	services:
	rucio-client:
	image: ghcr.io/vre-hub/vre-rucio-client
	container_name: rucio-client
	user: root
	environment:
	- RUCIO_CFG_CLIENT_X509_PROXY=/tmp/x509up
	- RUCIO_CFG_AUTH_TYPE=x509_proxy
	volumes:
	- ${CERTIFICATES_PATH}/usercert.pem:/opt/rucio/etc/client.crt
	- ${CERTIFICATES_PATH}/userkey.pem:/opt/rucio/etc/client.key
	- ${REANA_CONFIG_PATH}:/home/user/reana.yaml
	- ${PYGWB_PARAMETERS_PATH}:/home/user/parameters.ini
	- ${SNAKEFILE_PATH}:/home/user/Snakefile
	- ${SNAKEMAKE_CONFIG_PATH}:/home/user/config.yaml
	entrypoint: ["/bin/sh","-c"]
	command:
	- \|
	voms-proxy-init --voms escape --cert /opt/rucio/etc/client.crt --key /opt/rucio/etc/client.key --out /tmp/x509up
	export REANA_SERVER_URL=https://reana-vre.cern.ch
	export REANA_ACCESS_TOKEN=${REANA_ACCESS_TOKEN}
	reana-client secrets-add --env VONAME=escape \
	--env VOMSPROXY_FILE=x509up \
	--file /tmp/x509up \
	--env RUCIO_USERNAME=${ESCAPE_USERNAME} \
	--env RUCIO_RUCIO_HOST=https://vre-rucio.cern.ch \
	--env RUCIO_AUTH_HOST=https://vre-rucio-auth.cern.ch \
	--overwrite
	reana-client create -w ${WORKFLOW_NAME}
	export REANA_WORKON=${WORKFLOW_NAME}
	reana-client upload
	reana-client start
	while true; do reana-client status; sleep 2; done
	stdin_open: true
	tty: true
	[data_specs]
	interferometer_list =
	data_type = local
	channel = STRAIN
	time_shift = 0

	[preprocessing]
	new_sample_rate = 1024
	cutoff_frequency = 1.0
	segment_duration = 192
	number_cropped_seconds = 2
	window_downsampling = hamming
	ftype = fir

	[gating]
	gate_data = False
	gate_whiten = False
	gate_tzero = 1.0
	gate_tpad = 0.5
	gate_threshold = 50.0
	cluster_window = 0.5

	[window_fft_specs]
	window_fftgram = hann

	[window_fft_welch_specs]
	window_fftgram = hann

	[density_estimation]
	frequency_resolution = 0.25
	N_average_segments_welch_psd = 2
	coarse_grain_psd = False
	coarse_grain_csd = True
	overlap_factor_welch = 0.5
	overlap_factor = 0.5

	[postprocessing]
	polarization = tensor
	alpha = 0.0
	fref = 10.0
	flow = 5.0
	fhigh = 500.0

	[data_quality]
	notch_list_path =
	calibration_epsilon = 0.0
	alphas_delta_sigma_cut = ['-5', '0', '3']
	delta_sigma_cut = 0.2
	return_naive_and_averaged_sigmas = False

	[local_data]
	local_data_path =

	[output]
	save_data_type = npz
	version: 0.8.0
	inputs:
	files:
	- parameters.ini
	- Snakefile
	- config.yaml
	workflow:
	type: snakemake
	file: Snakefile
	from pathlib import Path

	configfile: "config.yaml"

	all_start_times = range(config['first_timestamp'],
	config['last_timestamp'] + 1,
	config['duration'])
	all_end_times = range(all_start_times.start + all_start_times.step,
	all_start_times.stop + all_start_times.step,
	all_start_times.step)

	output_path = Path(config['output_path'])
	final_estimate_filename = config['final_estimate_template'].format(**config)
	final_estimate_filepath = output_path / final_estimate_filename
	gwf_template = config['gwf_template'].format(duration=config['duration'])
	gwf_path_template = Path(config['scope']) / gwf_template
	rucio_gwf_path_template = f"{config['scope']}:{gwf_template}"
	rucio_get_template = f"rucio get {rucio_gwf_path_template}"

	rule all:
	input: final_estimate_filepath

	rule pygwb_combine:
	input:
	expand(config['estimate_file_template'],
	zip,
	t0=all_start_times,
	tf=all_end_times)
	output: final_estimate_filepath
	container: config['pygwb_container']
	shell: config['pygwb_combine_template'].format(**config)

	rule run_pygwb:
	input:
	expand(gwf_path_template,
	channel=config['channels'],
	allow_missing=True)
	output: temp(config['estimate_file_template'])
	container: config['pygwb_container']
	threads: workflow.cores
	shell:
	config['pygwb_pipe_template'].format(**config).format(
	t0='{wildcards.t0}',
	tf='{wildcards.tf}',
	gwf_paths=['{input[0]}', '{input[1]}'])

	rule download_data:
	output: temp(gwf_path_template)
	container: config['rucio_client_container']
	resources:
	voms_proxy=True,
	rucio=True
	threads: workflow.cores
	shell:
	rucio_get_template.format(channel='{wildcards.channel}',
	t0='{wildcards.t0}')