Skip to content

Instantly share code, notes, and snippets.

@markziemann
markziemann / genefunc.sh
Last active June 17, 2020 11:59
the goal of this script is to determine the fraction of genes in each biotype class that have annotated functions as determined by membership in either GO or REACTOME.
#!/bin/bash
# the goal of this script is to determine the fraction of genes in each
# biotype class that have annotated functions as determined by membership
# in either GO or REACTOME.
wget ftp://ftp.ensembl.org/pub/release-100/gtf/homo_sapiens/Homo_sapiens.GRCh38.100.gtf.gz
zcat Homo_sapiens.GRCh38.100.gtf.gz \
| grep -w gene \
@markziemann
markziemann / remap2gmt.sh
Last active November 22, 2019 01:39
Create a gmt file for the Remap2020 dataset http://remap.univ-amu.fr/target_page/AATF:9606
#!/bin/bash
# dependancies: parallel, bedtools, pigz
# promoters
wget -N ftp://ftp.ensembl.org/pub/release-98/gtf/homo_sapiens/Homo_sapiens.GRCh38.98.gtf.gz
GTFZ=Homo_sapiens.GRCh38.98.gtf.gz
TSS=tss.bed
zcat $GTFZ | grep 'exon_number "1"' | cut -f1,4,5,7,9 \
library(cobs)
library(quantreg)
library(parallel)
library(gplots)
interpolate<-function(dat,curve){
interpolate_points<-function(row,dat,curve){
MY_X=dat[row,1]
MY_Y=dat[row,2]
@markziemann
markziemann / pdftk_installer.sh
Created September 11, 2019 11:28
installer for pdftk
#!/bin/bash
#
# author: abu
# date: July 3 2019 (ver. 1.1)
# description: bash script to install pdftk on Ubuntu 18.04 for amd64 machines
##############################################################################
#
# change to /tmp directory
cd /tmp
# download packages
@markziemann
markziemann / ipr2gmt.sh
Created September 3, 2019 00:25
Create a library of gene sets based on protein domains
#!/bin/bash
# This script creates a GMT file of genesets classified by protein domains
# First need to obtain some data from ensembl biomart
# Go to https://www.ensembl.org/biomart/martview/
# Select human database
# Select the following attributes:
# - Gene stable ID
# - Interpro ID
# - Interpro Short Description
#!/bin/bash
echo "Hello world!"
formatdb -p F -o T -i Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.cds.all.fa
msbar -sequence mysample.fa -count 120 -point 4 -block 0 -codon 0 -outseq mysample_mutated.fa
blast2 -m 8 -p blastn -e 0.001 -d Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.cds.all.fa -i mysample_mutated.fa
@markziemann
markziemann / text_similarity_analysis.R
Created April 22, 2019 11:36
An example of text similarity analysis using R
library(stringr)
library(text2vec)
filelist = list.files(pattern = ".*.txt")
x = lapply(filelist, function(x)readLines(x))
prep_fun = function(x) {
x %>%
# make text lower case
str_to_lower %>%