Skip to content

Instantly share code, notes, and snippets.

@obenshaindw
obenshaindw / nda-interactive-resume.py
Last active September 24, 2020 15:15
Python script for interactively resuming an NDA submission.This script allows the user to select local file for each remaining file to be uploaded.This has an advantage over using nda-tools vtcmd command-line client, which will attempt to find the file and can be quite inefficient with multiple directories to scan.
import boto3
from botocore.exceptions import ClientError
import requests
import getpass
import os
import time
username = input('Username:')
password = getpass.getpass()
@obenshaindw
obenshaindw / mff-zipper.sh
Last active January 15, 2019 20:36
Package MFF files into zip files
#!/bin/bash
MFF_DIRECTORY=$1
for mffzip in "$MFF_DIRECTORY"*.mff.zip; do
echo "Renaming $mffzip directories to just ${mffzip%.zip}"
mv "$mffzip" "${mffzip%.zip}";
done
for mff in *.mff; do
@obenshaindw
obenshaindw / refresh_nda_token.sh
Created March 14, 2018 04:47
Bash function to update AWS FederationToken provided by NIMH Data Archive
#!/bin/bash
## NDA AWS Token Generator
## Author: NIMH Data Archives
## http://ndar.nih.gov
## License: MIT
## https://opensource.org/licenses/MIT
##############################################################################
#
# Script to retrieve generated AWS Tokens from NIMHDA
@obenshaindw
obenshaindw / gist:bb6c2b4cf2aa7028813a
Created August 6, 2015 17:51
Steam large files from s3 (i.e., FASTQ)
#!/bin/bash
# Pass in s3 URL=$1
# Set up Pathing
## Drop s3://
pname=${1#*//}
## Drop Bucket Name, i.e., NDAR_Central*, NDAR_Results, etc.
pname=${pname#*/}
## Get text after last /
fname=${1##*/}
@obenshaindw
obenshaindw / Zip files in s3
Last active August 29, 2015 14:14
get files from s3, zip, and put back into s3
echo $1
# Use grep REGEX to extract portion of s3 URL to reuse as zip file name.
folder=`echo $1 | grep -Eio '\/([0-9]+)\/$' | grep -Eio '([0-9]+)'`
mkdir ./$folder
echo s3cmd get --recursive $1 ./$folder
s3cmd get --recursive $1 ./$folder
echo zip -r $folder ./$folder/*
zip -r $folder ./$folder/*
echo rm -rf ./$folder/
rm -rf ./$folder/
@obenshaindw
obenshaindw / extract-genotypes.pl
Created February 4, 2015 15:47
Extract genotypes from multisample VCF file using vcftools
use strict;
use warnings;
use Vcf;
my $filename = $ARGV[0];
open ( my $handle, "<", $filename);
my $vcf = Vcf->new(fh=>$handle);
$vcf->parse_header();
vcf_iterate();
@obenshaindw
obenshaindw / Stream VCF from S3
Last active April 6, 2023 09:45
Stream VCF file from AWS s3 and do stuff (sort, gzip, index, subset for specific region)
#!/usr/bin/bash
#
# make_gz.sh
#
# Call this script with a list of s3 locations with VCF files to parse
# aws --profile NDAR s3 ls s3:/S3_URL/ | awk '{print $4}' | xargs -n1 -P4 sh make_gz.sh
# xargs -n1 -P4 accepts one argument and runs 4 parallel processes
#
@obenshaindw
obenshaindw / Add dbSNP IDs to a VCF file
Last active August 21, 2023 21:47
Add dbSNP IDs to a VCF file that doesn't have them.
#GATK Method <- Slower and keeps original ID plut dbSNP rsID
# R=Reference FASTA
# V=VCF file to add IDs to
# --dbsnp = dbsnp VCF -- download from NCBI FTP
java -jar GenomeAnalysisTK.jar -R /reference/Homo_sapiens_assembly19.fasta -T VariantAnnotator -V vcf_to_add_id_to.vcf --dbsnp /reference/dbsnp_137.b37.vcf.gz --out /data/Broad.chr1.annotated.vcf
#bcftools Method <- Faster, replaces existing ID with dbSNP rsID
/usr/bin/htslib/bcftools/bcftools annotate -a /reference/dbsnp_137.b37.vcf.gz -c ID vcf_to_add_id_to.vcf
@obenshaindw
obenshaindw / Reheader a VCF file
Last active August 29, 2015 14:14
Reheader VCF
/usr/bin/htslib/bcftools view -H vcf_with_bad_header.vcf > vcf_header.vcf
vim vcf_header.vcf
#Make changes to header
/usr/bin/htslib/bcftools/bcftools reheader -h vcf_header vcf_with_bad_header.vcf -o reheadered.vcf
@obenshaindw
obenshaindw / Fix Chromosome Name in a VCF
Last active August 29, 2015 14:14
Fix Chromosome Name in VCF
/usr/bin/htslib/bcftools/bcftools view vcf_with_chr.vcf | sed "s/chr//g" | /usr/bin/htslib/htslib/bgzip -c > BCM_hg19.reheader.no_chr.vcf.gz