Skip to content

Instantly share code, notes, and snippets.

View sgsfak's full-sized avatar

Stelios Sfakianakis sgsfak

  • Heraklion Greece
  • 00:23 (UTC +03:00)
  • LinkedIn in/sgsfak
View GitHub Profile
@sgsfak
sgsfak / august5.sh
Last active August 29, 2015 13:56
Find Years with August with 5 Fridays, Saturdays , and Sundays
for j in `seq 2014 2050`; do
i=8
ncal $i $j | tail -3 | (echo -n "$j," ; awk 'BEGIN{ORS=","} {print NF-1}';echo)
done | egrep "5,5,5,$" |cut -f1 -d,
@sgsfak
sgsfak / parse_scpecg.js
Created February 11, 2014 13:01
Parses an SCP-ECG file and prints the demographics/data acquisition information (in Section 1)
var fs = require('fs')
var DEBUG = true;
var log = console.log;
if (!DEBUG)
log = function() {};
function parse_scp_time(fval) {
var hour = fval.readInt8(0);
# EXAMPLE USAGE
# example of colsidecolors rowsidecolors (single column, single row)
mat <- matrix(1:100, byrow=T, nrow=10)
column_annotation <- sample(c("red", "blue", "green"), 10, replace=T)
column_annotation <- as.matrix(column_annotation)
colnames(column_annotation) <- c("Variable X")
row_annotation <- sample(c("red", "blue", "green"), 10, replace=T)
row_annotation <- as.matrix(t(row_annotation))

Reproducing (some of) the sequence alignment processes of 1000genomes project

Download the 100genomes samples

First I download the sequence.index file from the 1000genomes EBI FRP site:

<out.json jq -r '.Root.ResultSet.Entity[] | [.color, .width, .average, .reference, .Time]|@csv' > out.csv
@sgsfak
sgsfak / exiv2date
Created January 7, 2015 20:28
Given a set of JPEG images in a single directory, it prints the Exif based date time information
exiv2 -Pv -g Exif.Image.DateTime *.JPG | awk '{gsub(/:/,"/", $2); print $1,$2}'
Q: what are "single tree-based" (as opposed to forest-based) supervised learning methods?
A: some of my favorites:
- ADT
+ wiki: http://en.wikipedia.org/wiki/Alternating_decision_tree
+ ref: http://perun.pmf.uns.ac.rs/radovanovic/dmsem/cd/install/Weka/doc/classifiers-papers/trees/ADTree/atrees.pdf
- rpart in R
+ http://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf
'''
Non-parametric computation of entropy and mutual-information
Adapted by G Varoquaux for code created by R Brette, itself
from several papers (see in the code).
These computations rely on nearest-neighbor statistics
'''
import numpy as np
/* curl_multi_test.c
Clemens Gruber, 2013
<clemens.gruber@pqgruber.com>
Code description:
Requests 4 Web pages via the CURL multi interface
and checks if the HTTP status code is 200.
Update: Fixed! The check for !numfds was the problem.
## gb_gpl17000.csv contains the Genbank ids of the cDNA EST transcripts, downloaded from
## http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL17000
## (the GB_ACC column)
## So for each of these accession numbers we download the corresponding page from NCBI (EST database)
## and search (grep) for the Entrez Gene id
cat gb_gpl17000.csv |
parallel -j4 --tagstring '{}' "curl -s http://www.ncbi.nlm.nih.gov/nucest/{} | fgrep '/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids='" |
gawk '{print $1, gensub(/^.*list_uids=([0-9]+).*$/, "\\1", "g", $3)}' |
tee gb_gpl17000_eg_map.txt