Skip to content

Instantly share code, notes, and snippets.

View arne-cl's full-sized avatar

Arne Neumann arne-cl

  • Potsdam
View GitHub Profile
@arne-cl
arne-cl / mmax2_coreference_chains_in_discoursegraphs.ipynb
Created September 4, 2014 14:00
DiscourseGraphs: extracting coreference chains from MMAX2 annotated corpora
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@arne-cl
arne-cl / which.py
Created August 20, 2014 13:08
prints the install path of a given Python package
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Arne Neumann
#
# Purpose: prints the path where the given Python package is installed.
# This might be interesting if you're working with multiple environments
# and are unsure if/where a package was installed.
import os
import sys
@arne-cl
arne-cl / find_one_outlier.R
Created August 7, 2014 10:38
find one outlier using Grubbs' test
# code adapted from Lukasz Komsta's grubbs.test
# outlier value ("o" in grubbs.test) is stored in $outlier_value
# row name of the outlier ("G" in grubbs.test) is stored in $outlier_rowname
find_one_outlier <- function (x, opposite = FALSE)
{
DNAME <- deparse(substitute(x))
x <- sort(x[complete.cases(x)])
n <- length(x)
@arne-cl
arne-cl / decour2paula.py
Created July 23, 2014 16:53
convert a DeCour XML file into a PAULA XML document
#!/usr/bin/env python
# decour2paula.py: convert a DeCour XML file into a PAULA XML document
# usage: decour2paula.py decour_file paula_output_folder
import sys
from discoursegraphs.readwrite import DecourDocumentGraph
from discoursegraphs.readwrite import write_paula
if __name__ == '__main__':
ddg = DecourDocumentGraph(sys.argv[1])
@arne-cl
arne-cl / download-unpack-list
Created July 11, 2014 08:11
dowload a compressed file, unpack it, cd into the directory and list its content
#!/usr/bin/env bash
url="$1"
wget -q "$url"
tarfile=${url##*/} # strip off the part before the last slash
dtrx "$tarfile"
dir=${tarfile%%.*} # strip off everything after the first dot
cd "$dir"
ls
# TODO: stay in $dir after the script ends
@arne-cl
arne-cl / rst2csv.py
Last active August 29, 2015 14:01
prints a rhetorical structure tree (RS3) as table
# -*- coding: utf-8 -*-
# #!/usr/bin/python
# Titel: rst.py
# Discription: prints a rst tree as table
# Lizenz: GPLv3
# Author: Andre Herzog
# vers.: 0.1c
# Date: 26.03.2014
import sys
@arne-cl
arne-cl / epydoc_to_sphinx.sh
Last active June 11, 2021 14:04 — forked from Kami/migrate_docstrings.sh
converts epydoc docstrings to sphinx docstrings (restructuredText)
#!/usr/bin/env bash
#
# Script for migrating from epydoc to Sphinx style docstrings.
#
# WARNING: THIS SCRIPT MODIFIES FILES IN PLACE. BE SURE TO BACKUP THEM BEFORE
# RUNNING IT.
#
# Forked from: https://gist.github.com/Kami/6734885
DIRECTORY=$1
@arne-cl
arne-cl / setup.py
Created May 4, 2014 13:02
minimal example setup.py for a single-module package
#!/usr/bin/env python
import sys
import os
try:
from setuptools import setup
except ImportError:
from distutils.core import setup
here = os.path.abspath(os.path.dirname(__file__))
@arne-cl
arne-cl / tigerxml2txt.py
Created December 11, 2013 16:15
converts TigerXML files into tokenized plain text (one word per line with an empty line between sentences).
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Arne Neumann
#
# Purpose: extracts sentences from a Tiger XML input file and writes
# them to an output file (one word per line with an empty line
# between sentences).
import sys
import codecs