Skip to content

Instantly share code, notes, and snippets.

Andreas van Cranenburgh andreasvc

Block or report user

Report or block andreasvc

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@andreasvc
andreasvc / bowclassify.py
Created May 6, 2019
A baseline Bag-of-Words text classification
View bowclassify.py
"""A baseline Bag-of-Words text classification.
Usage: python3 classify.py <train.txt> <test.txt> [--svm] [--tfidf]
train.txt and test.txt should contain one "document" per line,
first token should be the label.
By default, use Naive Bayes and relative frequencies.
Pass --svm to use Linear SVM instead of Naive Bayes;
Pass --tfidf to use tf-idf instead of relative frequencies.
"""
import sys
View prepare_110kDBRD_fasttext.py
"""Prepare https://benjaminvdb.github.io/110kDBRD/ for use with fastText.
Divide train set into 90% train and 10% dev, balance positive and negative
rewiews, and shuffle. Write result in fastText format."""
import os
import re
import random
import glob
from syntok.tokenizer import Tokenizer
@andreasvc
andreasvc / corenlpxmltoconll2012.py
Last active May 8, 2019
Convert XML output of Stanford CoreNLP to CoNLL 2012 format
View corenlpxmltoconll2012.py
"""Convert XML output of Stanford CoreNLP to CoNLL 2012 format.
$ ./corenlp.sh -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref \
-output.printSingletonEntities true \
-file /tmp/example.txt
$ python3 corenlpxmltoconll2012.py example.txt.xml > example.conll`
"""
import re
import sys
from lxml import etree
View iwpt2013.xml
<?xml version='1.0' encoding='UTF-8'?>
<volume id="W13">
<paper id="5700">
<title>Proceedings of The 13th International Conference on Parsing Technologies (IWPT 2013)</title>
<editor><first>Harry</first><last>Bunt</last></editor>
<editor><first>Khalil</first><last>Sima'an</last></editor>
<editor><first>Liang</first><last>Huang</last></editor>
<month>November</month>
<year>2013</year>
<address>Nara, Japan</address>
@andreasvc
andreasvc / xmientityrename.py
Last active Apr 1, 2019
Rename numeric entity labels in .xmi file to text of first mention
View xmientityrename.py
"""Rename numeric entity labels in .xmi file to text of first mention.
Usage: python3 xmientityrename.py <FILE>...
Original file is modified in-place.
Only non-empty entities with numeric names are changed.
See https://github.com/nilsreiter/CorefAnnotator/issues/173"""
import os
import sys
from lxml import etree
@andreasvc
andreasvc / preprocess.py
Created Feb 10, 2019
Preprocess movie review polarity dataset v2.0
View preprocess.py
"""Preprocess movie review polarity dataset v2.0.
http://www.cs.cornell.edu/people/pabo/movie-review-data/
"""
import os
import re
import glob
import random
from syntok.tokenizer import Tokenizer
def process(path, pattern, out):
View cellbench.pyx
"""Run with python -c 'import pyximport; pyximport.install(); import cellbench; cellbench.main()'
"""
from libc.stdint cimport uint32_t
from libc.math cimport sqrt, modf
from libc.math cimport round as c_round
ctypedef uint32_t Label
cdef inline size_t cellidx(short start, short end, short lensent,
Label nonterminals):
View adventofcode.py
"""Advent of Code 2017. http://adventofcode.com/2017 """
import sys
import array
from collections import Counter, defaultdict
from operator import xor
from functools import reduce
from itertools import count
from binascii import hexlify
import numpy as np
@andreasvc
andreasvc / README.md
Last active Feb 6, 2018
Word lists for extraction of physical descriptions
View README.md

Word lists for extraction of physical descriptions

These are XPath macros used in our DSH paper on physical descriptions of appearance.

English translation of macro names

  • uiterlijkN = looksN
  • uiterlijkA = looksA
  • persoon = person
  • kleding = clothing
View checkall.py
"""Tool to check if function/class definitions in Python files match with
their __all__ attribute. Rudimentary support for Cython.
"""
import sys
import re
from collections import Counter
for filename in sys.argv[1:]:
with open(filename, 'rt') as inp:
You can’t perform that action at this time.