Skip to content

Instantly share code, notes, and snippets.

@obenshaindw
Created August 6, 2015 17:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save obenshaindw/bb6c2b4cf2aa7028813a to your computer and use it in GitHub Desktop.
Save obenshaindw/bb6c2b4cf2aa7028813a to your computer and use it in GitHub Desktop.
Steam large files from s3 (i.e., FASTQ)
#!/bin/bash
# Pass in s3 URL=$1
# Set up Pathing
## Drop s3://
pname=${1#*//}
## Drop Bucket Name, i.e., NDAR_Central*, NDAR_Results, etc.
pname=${pname#*/}
## Get text after last /
fname=${1##*/}
## Get text before first /
full_path=${pname%/*}
# Create directories if not exist
if [ ! -e /data/$full_path ]
then
mkdir -p /data/$full_path
fi
# Create Fifo pipe if not exists
if [ ! -e /data/$pname ]
then
mkfifo /data/$pname
fi
# Create buffered file stream from s3.
# Note use of Picard Tools FifoBuffer to make large file buffer.
s3cmd get $1 - | java -jar /home/obenshaindw/picard-tools-1.135/picard.jar FifoBuffer > /data/$full_path/$fname &
# Access the Fifo object to begin streaming data from the s3 object
# Samtools view -H /data/$pname
# bwa mem -M -t 16 ref.fa data/fifo_for_fastq1.gz /data/fifo_for_fastq2.gz > aln.sam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment