Skip to content

Instantly share code, notes, and snippets.

View arne-cl's full-sized avatar

Arne Neumann arne-cl

  • Potsdam
View GitHub Profile
@arne-cl
arne-cl / rst2csv.py
Last active August 29, 2015 14:01
prints a rhetorical structure tree (RS3) as table
# -*- coding: utf-8 -*-
# #!/usr/bin/python
# Titel: rst.py
# Discription: prints a rst tree as table
# Lizenz: GPLv3
# Author: Andre Herzog
# vers.: 0.1c
# Date: 26.03.2014
import sys
@arne-cl
arne-cl / download-unpack-list
Created July 11, 2014 08:11
dowload a compressed file, unpack it, cd into the directory and list its content
#!/usr/bin/env bash
url="$1"
wget -q "$url"
tarfile=${url##*/} # strip off the part before the last slash
dtrx "$tarfile"
dir=${tarfile%%.*} # strip off everything after the first dot
cd "$dir"
ls
# TODO: stay in $dir after the script ends
@arne-cl
arne-cl / decour2paula.py
Created July 23, 2014 16:53
convert a DeCour XML file into a PAULA XML document
#!/usr/bin/env python
# decour2paula.py: convert a DeCour XML file into a PAULA XML document
# usage: decour2paula.py decour_file paula_output_folder
import sys
from discoursegraphs.readwrite import DecourDocumentGraph
from discoursegraphs.readwrite import write_paula
if __name__ == '__main__':
ddg = DecourDocumentGraph(sys.argv[1])
@arne-cl
arne-cl / find_one_outlier.R
Created August 7, 2014 10:38
find one outlier using Grubbs' test
# code adapted from Lukasz Komsta's grubbs.test
# outlier value ("o" in grubbs.test) is stored in $outlier_value
# row name of the outlier ("G" in grubbs.test) is stored in $outlier_rowname
find_one_outlier <- function (x, opposite = FALSE)
{
DNAME <- deparse(substitute(x))
x <- sort(x[complete.cases(x)])
n <- length(x)
@arne-cl
arne-cl / which.py
Created August 20, 2014 13:08
prints the install path of a given Python package
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Arne Neumann
#
# Purpose: prints the path where the given Python package is installed.
# This might be interesting if you're working with multiple environments
# and are unsure if/where a package was installed.
import os
import sys
@arne-cl
arne-cl / mmax2_coreference_chains_in_discoursegraphs.ipynb
Created September 4, 2014 14:00
DiscourseGraphs: extracting coreference chains from MMAX2 annotated corpora
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@arne-cl
arne-cl / fefe_scraper.py
Created March 31, 2015 12:22
simple scraper for blog.fefe.de
import os
import datetime
from datetime import timedelta
import requests
def create_dir(path):
"""
Creates a directory. Warns, if the directory can't be accessed. Passes,
@arne-cl
arne-cl / tmux.conf
Created September 15, 2015 17:33
tmux config file with better key bindings
# use Ctrl-a instead of Ctrl-b as the default key
unbind-key C-b
set -g prefix C-a
# use | to split windows horizontally
# use - to split windows vertically
unbind %
bind | split-window -h
bind - split-window -v
@arne-cl
arne-cl / nested_dict_print.py
Created September 19, 2015 20:31
prettyprint a nested dictionary
def nprint(d, tab=0, tab_width=2):
'''print nested key-value datastructures (e.g. dicts)'''
for k, v in d.iteritems():
if not hasattr(v, 'iteritems'):
print u'{}{} {}'.format(' '*tab, k, v)
else:
print u'{}{}:'.format(' '*tab, k)
nprint(v, tab=tab+tab_width)
@arne-cl
arne-cl / tigerxml2txt.py
Created December 11, 2013 16:15
converts TigerXML files into tokenized plain text (one word per line with an empty line between sentences).
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Arne Neumann
#
# Purpose: extracts sentences from a Tiger XML input file and writes
# them to an output file (one word per line with an empty line
# between sentences).
import sys
import codecs