Upendra Kumar Devisetty upendrak

## dockerizing.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              3 stars
            
          
                crashfrog
                / dockerizing.md
            
            
              Last active
              February 18, 2016 05:23
            
          
    Dockerizing tools from source

Justin Payne, May 30 2015
Introduction

I get a lot of value out of putting bioinformatics tools in Docker containers, since once they're containerized with an automated build script (called a "Dockerfile") it's really easy to keep them up to date and manage their installations on different machines without their individual dependencies stepping all over each other. It's a really convenient way to try out new tools without taking the risk of borking your carefully-maintained "working" Linux install. Tools like boot2docker make Docker images runnable on non-Linux platforms, as well, resulting in improved portability of tools that might have platform (or even distro) specific dependencies. I've done this a couple of times, now, so I thought I'd share some tips and tricks.

  
## econ_data_wrangle_clean.R
library(tidyverse)
library(rio)
library(rvest)
library(janitor)


# Rcode to go and fetch country codes
country_codes <- read_html("http://web.stanford.edu/~chadj/countrycodes6.3") %>%
  html_text() %>%
  str_extract_all("[A-Z]{3}") %>%

## shiny module snippet
snippet module
	${1:name}UI <- function(id){
		ns <- NS(id)
		tagList(

		)
	}

	${1:name} <- function(input, output, session){


## extract_transcript_intron.sh
## requirement bed tools
BIN='/home/hirak/bedtools2/bin'
## Gencode
## gencode.v29.chr_patch_hapl_scaff.annotation.gtf
GTF_FILE="gencode.v29.chr_patch_hapl_scaff.annotation.gtf"

# extract transcript boundaries
cat $GTF_FILE | awk 'BEGIN{OFS="\t";} $3=="transcript" {print $1,$4-1,$5,$12}' | tr -d "\"" | tr -d ";" | $BIN/sortBed > gencode_transcript_intervals.bed

# merge exon boundaris

## md_to_rst.sh
# This script was created to convert a directory full
# of markdown files into rst equivalents. It uses
# pandoc to do the conversion.
#
# 1. Install pandoc from http://johnmacfarlane.net/pandoc/
# 2. Copy this script into the directory containing the .md files
# 3. Ensure that the script has execute permissions
# 4. Run the script
#
# By default this will keep the original .md file

## docker2singularity.sh
#!/bin/bash

# Based on https://github.com/sylabs/singularity/issues/1537
# Usage: bash docker2singularity.sh mydockerimg mysingularity.simg

set -ueo pipefail

IMG=$1
FILEOUT=$2
PORT=${3:-5000}

## bst.R
# Bootstrap tour step
.bsTourStep <- function(i, id, title, content, pos = "right", tab = NULL){
  id <- paste0("#", id)
  x <- paste0("    {\n      element: \"", id,
              "\",\n      title: \"", title,
              "\",\n      content: \"", content,
              "\",\n      placement: \"", pos, "\"")
  if(i==1 && !is.null(tab)){
    x <- paste0(x, ",\n      onShow: function (tour) { $(\"", tab, "\").tab('show');}\n    }")
  } else {

## renderSite.R
# This script builds on Aleszu Bajak's excellent
# [tutorial on building a course website using R Markdown and Github pages](http://www.storybench.org/convert-google-doc-rmarkdown-publish-github-pages/).
# I was excited about the concept but wanted to automate a few of the production steps: namely generating the HTML files
# for the site from the RMD pages (which Aleszu describes doing one-by-one) and generating the site navigation menu,
# which Aleszu handcodes in the `_site.yml` file. This script should automate both processes, though it may have some quirks
# unique to my setup that you'd want to tweak to fit your own. It's likely more loquacious than necessary as well, so feel free
# to condense as you can. Ideally, each time you make updates to your RMD files you can run this script to generate updated HTML
# pages and a new `_site.yml`. Then commit changes to Github and you're up and running!

# Once you've got everything configured for your own site below, you should be able to run `source('rend

## optimization.py
from dnachisel import *

# Subbed in `CCTCCT` for `AAAGTT` to account for proline substitution
virus = 'ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCAGTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCCCTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCATTTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGCGAATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGGTAATTTCAAAAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTATTAATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTATTAACATCACTAGGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCAGGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATAATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTTGAAATCCTTCAC

## Spark Dataframe Cheat Sheet.py
# A simple cheat sheet of Spark Dataframe syntax
# Current for Spark 1.6.1

# import statements
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark.sql.functions import *

#creating dataframes
df = sqlContext.createDataFrame([(1, 4), (2, 5), (3, 6)], ["A", "B"]) # from manual data
	library(tidyverse)
	library(rio)
	library(rvest)
	library(janitor)


	# Rcode to go and fetch country codes
	country_codes <- read_html("http://web.stanford.edu/~chadj/countrycodes6.3") %>%
	html_text() %>%
	str_extract_all("[A-Z]{3}") %>%
	snippet module
	${1:name}UI <- function(id){
	ns <- NS(id)
	tagList(

	)
	}

	${1:name} <- function(input, output, session){
	## requirement bed tools
	BIN='/home/hirak/bedtools2/bin'
	## Gencode
	## gencode.v29.chr_patch_hapl_scaff.annotation.gtf
	GTF_FILE="gencode.v29.chr_patch_hapl_scaff.annotation.gtf"

	# extract transcript boundaries
	cat $GTF_FILE \| awk 'BEGIN{OFS="\t";} $3=="transcript" {print $1,$4-1,$5,$12}' \| tr -d "\"" \| tr -d ";" \| $BIN/sortBed > gencode_transcript_intervals.bed

	# merge exon boundaris
	# This script was created to convert a directory full
	# of markdown files into rst equivalents. It uses
	# pandoc to do the conversion.
	#
	# 1. Install pandoc from http://johnmacfarlane.net/pandoc/
	# 2. Copy this script into the directory containing the .md files
	# 3. Ensure that the script has execute permissions
	# 4. Run the script
	#
	# By default this will keep the original .md file
	#!/bin/bash

	# Based on https://github.com/sylabs/singularity/issues/1537
	# Usage: bash docker2singularity.sh mydockerimg mysingularity.simg

	set -ueo pipefail

	IMG=$1
	FILEOUT=$2
	PORT=${3:-5000}
	# Bootstrap tour step
	.bsTourStep <- function(i, id, title, content, pos = "right", tab = NULL){
	id <- paste0("#", id)
	x <- paste0(" {\n element: \"", id,
	"\",\n title: \"", title,
	"\",\n content: \"", content,
	"\",\n placement: \"", pos, "\"")
	if(i==1 && !is.null(tab)){
	x <- paste0(x, ",\n onShow: function (tour) { $(\"", tab, "\").tab('show');}\n }")
	} else {
	# This script builds on Aleszu Bajak's excellent
	# [tutorial on building a course website using R Markdown and Github pages](http://www.storybench.org/convert-google-doc-rmarkdown-publish-github-pages/).
	# I was excited about the concept but wanted to automate a few of the production steps: namely generating the HTML files
	# for the site from the RMD pages (which Aleszu describes doing one-by-one) and generating the site navigation menu,
	# which Aleszu handcodes in the `_site.yml` file. This script should automate both processes, though it may have some quirks
	# unique to my setup that you'd want to tweak to fit your own. It's likely more loquacious than necessary as well, so feel free
	# to condense as you can. Ideally, each time you make updates to your RMD files you can run this script to generate updated HTML
	# pages and a new `_site.yml`. Then commit changes to Github and you're up and running!

	# Once you've got everything configured for your own site below, you should be able to run `source('rend
	from dnachisel import *

	# Subbed in `CCTCCT` for `AAAGTT` to account for proline substitution
	virus = 'ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCAGTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCCCTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCATTTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGCGAATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGGTAATTTCAAAAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTATTAATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTATTAACATCACTAGGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCAGGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATAATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTTGAAATCCTTCAC
	# A simple cheat sheet of Spark Dataframe syntax
	# Current for Spark 1.6.1

	# import statements
	from pyspark.sql import SQLContext
	from pyspark.sql.types import *
	from pyspark.sql.functions import *

	#creating dataframes
	df = sqlContext.createDataFrame([(1, 4), (2, 5), (3, 6)], ["A", "B"]) # from manual data