Skip to content

Instantly share code, notes, and snippets.

View renaud's full-sized avatar

Renaud Richardet renaud

  • Eaternity
  • Lausanne, Switzerland
View GitHub Profile
@renaud
renaud / 1_ReferencesClassifier2.java
Last active December 22, 2015 19:39
Mallet MaxEnt classifier for paper references
package ch.epfl.bbp.uima.projects.references;
import static cc.mallet.pipe.iterator.FileIterator.LAST_DIRECTORY;
import static com.google.common.collect.Lists.newArrayList;
import static java.util.regex.Pattern.compile;
import static org.apache.commons.lang.StringUtils.join;
import static org.slf4j.LoggerFactory.getLogger;
import java.io.File;
import java.io.FileFilter;
@renaud
renaud / migrate_to_uimaFIT_2.sh
Created September 27, 2013 09:57
shell commands to ease the migration to uimaFIT version 2
#!/bin/sh
############################################
# MAKE SURE TO BACKUP YOUR FILES FIRST
############################################
# see http://uima.apache.org/d/uimafit-2.0.0/tools.uimafit.book.html#d5e617
#Change of package names:
find . -name '*.java' -print | xargs perl -p -i -e 's/org.uimafit/org.apache.uima.fit/g'
@renaud
renaud / dca2ldac.py
Last active December 25, 2015 09:39
Transforms topic-model input file, from DCA format (space separated) to LDA-C format (column-separated)
'''
Transforms topic-model input file,
from DCA format (space separated)
to LDA-C format (column-separated)
@author renaud@apache.org
'''
import sys
dca_file = sys.argv[1]
package topic
import spark.broadcast._
import spark.SparkContext
import spark.SparkContext._
import spark.RDD
import spark.storage.StorageLevel
import scala.util.Random
import scala.math.{ sqrt, log, pow, abs, exp, min, max }
import scala.collection.mutable.HashMap
@renaud
renaud / go_fetch.py
Created October 17, 2013 07:36
Retrieve synonyms of gene ontology (GO) terms
import urllib, sys
from xml.etree import cElementTree as ElementTree
def get_go_name(go_id):
sys.stdout.write("GO"+go_id+"\t"),
#get the GO entry as XML
xml = urllib.urlopen("http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:"+go_id+"&format=oboxml")
#open in cElementTree, for fast XML parsing
for event, element in ElementTree.iterparse(xml):
#need to make sure we are getting the name contained within the 'term' entry
@renaud
renaud / git_aliases.sh
Created October 30, 2013 13:46
aliases to list files to add/delete from git
alias gita="git status | egrep 'modified: ' | sed $'s/#\tmodified:/git add/g'"
alias gitd="git status | egrep 'deleted: ' | sed $'s/#\tdeleted:/git rm/g'"
@renaud
renaud / subscripts.java
Last active December 27, 2015 23:59
finding misextracted subscripts in pdfs, using PdfTextStream
@Test
public void testSubscripts() throws Exception {
final Pattern SUBSCRIPTS = Pattern.compile("^[ \\d]{10,1000}$");
File ROOT = new File(
"/Volumes/scratch/richarde/pdfs/201307/");
for (File pdf : ROOT.listFiles()) {
if (pdf.getName().endsWith(".pdf")) {
try {
@renaud
renaud / DkPro Binary Cas evaluation.md
Last active December 28, 2015 13:09
DkPro Binary Cas evaluation
@renaud
renaud / test.py
Last active August 3, 2019 12:58
python NCBI example
from Bio import Entrez
Entrez.email = "A.N.Other@example.com"
handle = Entrez.esearch(db="pubmed", term="pyramidal cell")
record = Entrez.read(handle)
len(record) # this matches http://www.ncbi.nlm.nih.gov/pubmed/?term=pyramidal%20cell
first = record["IdList"][0]
# from http://stackoverflow.com/a/20149984/125617
@renaud
renaud / gist:8858885
Created February 7, 2014 08:10
evaluate Mallet CRF
cc.mallet.types.InstanceList.CrossValidationIterator crossValidationIt = trainingInstanceList.crossValidationIterator(folds, new Random().nextInt());
while (crossValidationIt.hasNext()) {
InstanceList[] il = crossValidationIt.nextSplit();
CRF crf = new CRF(trainingInstanceList.getPipe(), null);
CRFTrainerByThreadedLabelLikelihood trainer = new CRFTrainerByThreadedLabelLikelihood(crf, threads);
// CRFTrainerByLabelLikelihood trainer = new CRFTrainerByLabelLikelihood(crf);
MultiSegmentationEvaluator eval = new MyMultiSegmentationEvaluator(//
new InstanceList[] { testingSet }, new String[] { "TTesting" }, tags, continueTags);