Skip to content

Instantly share code, notes, and snippets.

View claczny's full-sized avatar

Cedric Laczny claczny

View GitHub Profile
#ATCC NCBI taxonomy Definition GenBank Comment
ATCC 17978 400667 Acinetobacter baumannii ATCC 17978 CP000521.1
ATCC 17982 411466 Actinomyces odontolyticus ATCC 17982 AAYI00000000.2
ATCC 10987 222523 Bacillus cereus ATCC 10987 AE017194.1
ATCC 8482 435590 Bacteroides vulgatus ATCC 8482 CP000139.1
ATCC MYA-2876 237561 Candida albicans SC5314 GCA_000182965.2
ATCC 51743 290402 Clostridium beijerinckii NCIMB 8052 CP000721.1 Found NCBI ID -> Chose 'Identical GenBank Sequence'
DSM 20539 243230 Deinococcus radiodurans R1 GCA_000008565.1
ATCC 47077 474186 Enterococcus faecalis OG1RF NC_017316.1
ATCC 700926 511145 Escherichia coli str. K-12 substr. MG1655 U00096.3 Found via http://www.lgcstandards-atcc.org/products/all/700926.aspx?geo_country=de
@claczny
claczny / Perl-5.18.2-foss-2015b-extended.eb
Created January 10, 2016 14:21
Extension of the Perl easyconfig to enable installation of LWP, including installation of LWP itself.
name = 'Perl'
version = '5.18.2'
versionsuffix = '-extended'
homepage = 'http://www.perl.org/'
description = """Larry Wall's Practical Extraction and Report Language"""
toolchain = {'name': 'foss', 'version': '2015b'}
toolchainopts = {'optarch': True, 'pic': True}
@claczny
claczny / gist:39bb50364dcfb1b8367e0d5d28cf57e7
Created June 16, 2016 07:13
Opening iTerm From a Finder Directory
on run {input, parameters}
tell application "Finder"
set dir_path to quoted form of (POSIX path of (folder of the front window as alias))
end tell
CD_to(dir_path)
end run
on CD_to(theDir)
tell application "iTerm"
activate
@claczny
claczny / Makefile.pacbio_extract_fasta
Last active August 8, 2016 09:52
A makefile to extract FASTA sequences from PacBio files using the dextractor module from DAZZLER
SHELL=/bin/bash
MOVIE=YOUR_MOVIE_ID
FASTA=$(MOVIE).fasta
#QUIVA=$(MOVIE).quiva
BAS_H5=$(MOVIE).bas.h5
BAX_1_H5=$(MOVIE).1.bax.h5
BAX_2_H5=$(MOVIE).2.bax.h5
BAX_3_H5=$(MOVIE).3.bax.h5
METADATA_XML=$(MOVIE).metadata.xml
@claczny
claczny / Makefile.cmp_bash5tools_dextractor
Last active August 8, 2016 11:24
Makefile to compare bash5tools and dextractor module, both for the extraction of FASTA-formatted reads from PacBio data. Includes downloading of the *large* raw PacBio data
SHELL=/bin/bash
MOVIE=m151020_151817_00127_c100889452550000001823187103261622_s1_p0
FASTA=$(MOVIE).fasta
#QUIVA=$(MOVIE).quiva
BAS_H5=$(MOVIE).bas.h5
BAX_1_H5=$(MOVIE).1.bax.h5
BAX_2_H5=$(MOVIE).2.bax.h5
BAX_3_H5=$(MOVIE).3.bax.h5
METADATA_XML=$(MOVIE).metadata.xml
@claczny
claczny / Kraken.mk
Last active August 31, 2016 09:15
A Makefile to run the taxonomic classification tool for sequences, Kraken, to filter and to create reports of the results.
SHELL := /bin/bash
#####
# PRESETS
#####
FASTA?=YOUR-FASTA-FILE.fa
KRAKEN_PATH?=KRAKEN-BINARY-TOP-FOLDER
KRAKEN_CORES?=10
KRAKEN_FILTER_THRESHOLD?=0.2
@claczny
claczny / compute_coverage.mk
Created September 5, 2016 08:22
A Makefile to compute the average genome coverage, coverage distribution, and other things from an input BAM-file. N.B. Secondary expansion (.SECONDEXPANSION) is used to create and populate a dedicated directory per sample.
SHELL=/bin/bash
SAMPLE?=<YOUR_SAMPLE>
DOUBLED_SAMPLE = $(SAMPLE)/$(SAMPLE)
RDIR?=results
DDIR?=data
#####
# BEAUTY TARGETS
@claczny
claczny / fit_and_plot_normal_mixtures.R
Created September 6, 2016 11:08
Fit and plot the fitted components of a multi-modal distribution assuming normal distributions as the components.
library(ggplot2)
library(plyr)
library(mixtools)
###
# GLOBAL THEME AND GLOBAL AESTHETICS
###
old <- theme_set(theme_bw() +
theme(text = element_text(size=12),
axis.title = element_text(size = 14, face="bold"),
@claczny
claczny / fuzzymatch_titles.py
Created January 6, 2017 15:45
Python code to fuzzy match two files (A and B) of titles to find missing titles in B, i.e., multiplications in A. Not very efficient, but does the job.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from collections import Counter
A_title_file = "/tmp/A_titles.txt"
B_title_file = "/tmp/B_titles.txt"
# Open the files and get the titles
A_titles = []
with open(A_title_file) as f:
@claczny
claczny / Makefile.amplicon_minion.benitez-paez_sanz
Created June 10, 2017 04:53
Makefile for downloading full-length amplicon sequencing MinION data from Benitez-Paez & Sanz: http://biorxiv.org/content/early/2017/06/08/117143
SHELL = /bin/bash
DDIR = orig_data
RDIR = results
PORETOOLS_BIN = . venv/bin/activate; poretools
PULLSEQ_BIN = ml vizbins_little_helpers; pullseq
MIN_LENGTH ?= 1000