Skip to content

Instantly share code, notes, and snippets.

View skurscheid's full-sized avatar

Sebastian Kurscheid skurscheid

View GitHub Profile
@skurscheid
skurscheid / getHostname.R
Last active October 27, 2016 22:53
Retrieve host/machine name in your R session
Sys.info()["nodename"]
@skurscheid
skurscheid / get_overrepresented_sequences_from_fastqc.awk
Last active July 1, 2016 04:09
extract over-represented sequences from FASTQ output file [fastqc_data.txt]
awk 'BEGIN{i=0}; {if ($1 == "Filename") {file=$2} else if ($1 ~ /[A|C|T|G]/ && length($1) > 40) {i+=1; print ">",file," ",i,"\n" $1 }}' < fastqc_data.txt
@skurscheid
skurscheid / file_to_subdir.sh
Created July 7, 2016 04:43
use part of file name to make subdirectories
for i in $(ls *.gz| cut -f 1,2 -d "_"| sort | uniq); do mkdir $i; mv ${i}*.gz $i; done
@skurscheid
skurscheid / amILocal.R
Created July 8, 2016 04:54
amILocal.R
# checks if script is being run on a machine with given name
# useful if code e.g. requires dynamic allocation of host-dependeten
# file paths
amILocal <- function(machinename = NULL){
if(is.null(machinename)) stop("Machinename is missing")
m <- Sys.info()["nodename"]
mn <- unlist(lapply(strsplit(m, "\\."), function(x) x[1]))
if (mn == machinename) {
return(TRUE)
} else {
@skurscheid
skurscheid / awkChromIDreformat.txt
Last active October 27, 2016 22:47
using awk to re-format BED file chromosome IDs
awk 'BEGIN {OFS = "\t";} {$1="chr"$1; print $0}' < [input.bed] > [output.bed]
@skurscheid
skurscheid / awkChromIDUCSCtoEnsembl.txt
Created October 27, 2016 22:52
manipulate BED files containing UCSC chromosome identifiers to conform with Ensembl annotation
awk 'BEGIN {OFS = "\t";} {gsub("chr", "", $1); print $0}' < mm10_rRNA.bed >temp.bed
@skurscheid
skurscheid / boxplot_example.R
Last active November 7, 2016 02:58
boxplot_example.R
# preparing example data for clusters
MCF10_promoter_clusters <- as.data.frame(matrix(nrow = nrow(tab4), ncol = ncol(tab4)))
colnames(MCF10_promoter_clusters) <- colnames(MCF10_expression_data)
rownames(MCF10_promoter_clusters) <- rownames(MCF10_expression_data)
for (x in colnames(MCF10_promoter_clusters)){
MCF10_promoter_clusters[,x] <- paste("cluster", sample(c(1:7), size = nrow(MCF10_promoter_clusters), replace = T), sep = "")
}
# example of plotting expession data by cluster group
MCF10_expression_data <- read.csv("MCF10_expression_data.csv", row.names = 1)
@skurscheid
skurscheid / heatmap_from_qPCR_data.R
Created April 27, 2017 06:30
creates a "heatmap" from pre-processed qPCR data using ggplot2 geom_tile()
require(gdata)
require(dtplyr)
require(ggplot2)
# setwd("~/OneDrive/Documents/ANU/Tremethick Lab/Students/Yichen - BSc Hons/")
# load the Excel sheet
# this presumes that the file is in the current working directory, you can check with getwd()...
dat <- read.xls("CTA HL summary.xlsx")
@skurscheid
skurscheid / sample_formatting.sh
Created August 23, 2017 05:59
pre-format filenames of PE FASTQ files for JSON config file
for i in $(ls | cut -f 1 -d "_" | uniq | sort -n); do j=$(ls ${i}_*.gz); s1=$(echo $j | cut -f 1 -d " "); s2=$(echo $j | cut -f 2 -d " "); echo "[\"${s1}\", \"${s2}\"],"; done
@skurscheid
skurscheid / getTrueName.c
Created February 5, 2018 23:29
C function to enable bash shells to "cd" into macOS aliases
// getTrueName.c
// http://web.archive.org/web/20100110234300/http://www.macosxhints.com/dlfiles/getTrueName.txt
//
// DESCRIPTION
// Resolve HFS and HFS+ aliased files (and soft links), and return the
// name of the "Original" or actual file. Directories have a "/"
// appended. The error number returned is 255 on error, 0 if the file
// was an alias, or 1 if the argument given was not an alias
//
// BUILD INSTRUCTIONS