Skip to content

Instantly share code, notes, and snippets.

View renaud's full-sized avatar

Renaud Richardet renaud

  • Eaternity
  • Lausanne, Switzerland
View GitHub Profile
@renaud
renaud / dca2ldac.py
Last active December 25, 2015 09:39
Transforms topic-model input file, from DCA format (space separated) to LDA-C format (column-separated)
'''
Transforms topic-model input file,
from DCA format (space separated)
to LDA-C format (column-separated)
@author renaud@apache.org
'''
import sys
dca_file = sys.argv[1]
package topic
import spark.broadcast._
import spark.SparkContext
import spark.SparkContext._
import spark.RDD
import spark.storage.StorageLevel
import scala.util.Random
import scala.math.{ sqrt, log, pow, abs, exp, min, max }
import scala.collection.mutable.HashMap
@renaud
renaud / go_fetch.py
Created October 17, 2013 07:36
Retrieve synonyms of gene ontology (GO) terms
import urllib, sys
from xml.etree import cElementTree as ElementTree
def get_go_name(go_id):
sys.stdout.write("GO"+go_id+"\t"),
#get the GO entry as XML
xml = urllib.urlopen("http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:"+go_id+"&format=oboxml")
#open in cElementTree, for fast XML parsing
for event, element in ElementTree.iterparse(xml):
#need to make sure we are getting the name contained within the 'term' entry
@renaud
renaud / git_aliases.sh
Created October 30, 2013 13:46
aliases to list files to add/delete from git
alias gita="git status | egrep 'modified: ' | sed $'s/#\tmodified:/git add/g'"
alias gitd="git status | egrep 'deleted: ' | sed $'s/#\tdeleted:/git rm/g'"
@renaud
renaud / subscripts.java
Last active December 27, 2015 23:59
finding misextracted subscripts in pdfs, using PdfTextStream
@Test
public void testSubscripts() throws Exception {
final Pattern SUBSCRIPTS = Pattern.compile("^[ \\d]{10,1000}$");
File ROOT = new File(
"/Volumes/scratch/richarde/pdfs/201307/");
for (File pdf : ROOT.listFiles()) {
if (pdf.getName().endsWith(".pdf")) {
try {
@renaud
renaud / DkPro Binary Cas evaluation.md
Last active December 28, 2015 13:09
DkPro Binary Cas evaluation
@renaud
renaud / tornado_rest_server.py
Created February 16, 2016 09:23
Tornado REST server base template
'''
REST endpoint
'''
import os, json
from datetime import date
from tornado import ioloop, web, autoreload
''' serves index.html'''
@renaud
renaud / psi.m
Created May 21, 2014 21:29
Digamma (psi) function, since Octave does not implement it yet
function y = psi(x)
%DIGAMMA Digamma function.
% DIGAMMA(X) returns digamma(x) = d log(gamma(x)) / dx
% If X is a matrix, returns the digamma function evaluated at each element.
% Reference:
%
% J Bernardo,
% Psi ( Digamma ) Function,
% Algorithm AS 103,
@renaud
renaud / acronyms.py
Created August 22, 2016 14:51
Link abbreviations to their full names Based on A Simple Algorithm for Identifying Abbreviations Definitions in Biomedical Text A. Schwartz and M. Hearst Biocomputing, 2003, pp 451-462.
#!/usr/bin/env python
'''Link abbreviations to their full names
Based on
A Simple Algorithm for Identifying Abbreviations Definitions in Biomedical Text
A. Schwartz and M. Hearst
Biocomputing, 2003, pp 451-462.
@renaud
renaud / .block
Last active October 24, 2017 09:12
Visualizing PageRank
license: apache-2.0
height: 650
scrolling: no
border: no