Skip to content

Instantly share code, notes, and snippets.

@tthyer
Created August 17, 2021 18:40
Show Gist options
  • Save tthyer/ab909e157316bb31a0d41045a0811f2a to your computer and use it in GitHub Desktop.
Save tthyer/ab909e157316bb31a0d41045a0811f2a to your computer and use it in GitHub Desktop.
Pre-run script for a Nextflow workflow that copies some Synapse files to an S3 bucket
#! /usr/bin/bash
set -ex
amazon-linux-extras install python3.8 --yes
python3.8 -m pip install --upgrade pip
pip install synapseclient # boto3 should already be installed but if not install that here
inputs_bucket = 'some-inputs-bucket' # TODO rename with your bucket
# requires that the synapseConfig be staged in a bucket accessible to the nextflow iam user
aws s3 cp "s3://${inputs_bucket}/synapse-config/synapseConfig" "${HOME}/.synapseConfig"
datadir="${PWD}data"
mkdir "${datadir}"
OUTPUT_FILE="${datadir}/syn-fetch.py"
cat > "$OUTPUT_FILE" << EOM
import boto3
import json
import os
import synapseclient
s3 = boto3.client('s3')
bucket = "${inputs_bucket}"
prefix_key = 'some-s3-prefix-key-for-objects' # TODO rename this prefix key
response = s3.list_objects(
Bucket=bucket,
Prefix=prefix_key
)
keys = [entry['Key'] for entry in response['Contents']]
syn = synapseclient.login()
folder_id = 'syn01234567' # TODO use a real synapse ID
files = syn.getChildren(folder_id, includeTypes=['file'], sortBy='NAME')
syn_ids = [file['id'] for file in files if '_I1_' not in file['name']]
downloadLocation = f'{os.getcwd()}/data/'
for file in files:
filename = file['name']
filekey = f'{prefix_key}/{filename}'
# if the file is not in our S3 bucket, copy from Synapse to there
if filekey not in keys:
syn_id = file['id']
entity = syn.get(syn_id, downloadLocation=downloadLocation)
s3.put_object(
Bucket=bucket,
Key=filekey,
Body=entity.path
)
os.remove(entity.path)
EOM
python3.8 "${OUTPUT_FILE}"
set +ex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment