Skip to content

Instantly share code, notes, and snippets.

View nschneid's full-sized avatar

Nathan Schneider nschneid

View GitHub Profile
@nschneid
nschneid / BibTeX_nschneid_longform.js
Last active May 11, 2019 23:31
Custom BibTeX exporters for Zotero
{
"translatorID": "8a68255b-24e5-47e0-afe5-f65fff578170",
"translatorType": 3,
"label": "BibTeX (nschneid, long form)",
"creator": "Simon Kornblith and Richard Karnesky and Nathan Schneider",
"target": "bib",
"minVersion": "2.1.9",
"maxVersion": null,
"priority": 200,
"inRepository": true,
@nschneid
nschneid / allcats.txt
Created June 25, 2012 15:52
Document-to-category mapping for NLTK ptb module (full Penn Treebank corpus reader)
WSJ/00/WSJ_0001.MRG news
WSJ/00/WSJ_0002.MRG news
WSJ/00/WSJ_0003.MRG news
WSJ/00/WSJ_0004.MRG news
WSJ/00/WSJ_0005.MRG news
WSJ/00/WSJ_0006.MRG news
WSJ/00/WSJ_0007.MRG news
WSJ/00/WSJ_0008.MRG news
WSJ/00/WSJ_0009.MRG news
WSJ/00/WSJ_0010.MRG news
@nschneid
nschneid / deverbals-from-nombank-examples.txt
Created June 25, 2012 22:44
List deverbal nominalizations in NomBank
abandonment.01 verb-abandon.01 -
abatement.01 verb-abate.01 -
abduction.01 verb-abduct.01 -
abolition.01 verb-abolish.01 -
abomination.01 verb-abominate.01 ARG1
abortion.01 verb-abort.01 -
absence.01 verb-absent.01 -
absorber.01 verb-absorb.01 ARG0
absorption.01 verb-absorb.01 -
abuse.01 verb-abuse.01 -
@nschneid
nschneid / zotselect-link.js
Last active January 15, 2024 02:56
Zotero export translator that generates a zotero://select link to an item in the Zotero library. (First a simple version, as well as a version that displays minimal citation information and stores further details as title text.)
@nschneid
nschneid / universal_tags.py
Created December 7, 2012 06:50
Utility for mapping to universal part-of-speech tagset
'''
Interface for converting POS tags from various treebanks
to the universal tagset of Petrov, Das, & McDonald.
The tagset consists of the following 12 coarse tags:
VERB - verbs (all tenses and modes)
NOUN - nouns (common and proper)
PRON - pronouns
ADJ - adjectives
@nschneid
nschneid / POSMappings.txt
Created September 7, 2013 15:50
Scripts for working with part-of-speech tagsets: describing the morphosyntactic attributes encoded by tags, and converting between different tagsets. Cf. https://gist.github.com/nschneid/4231292
# http://nlp.cs.nyu.edu/wiki/corpuswg/AnnotationCompatibilityReport
# Table 1: Part of Speech Compatibility
# (Initial Version from Manning and Schutz 1998, pp. 141-142)
# Extended to cover Claws1 and ICE
# cf. http://www.scs.leeds.ac.uk/ccalas/tagsets/brown.html
# Nathan Schneider, 2011-02-19:
# * Fixed some errors in brown column, e.g.: DT1 => DTI, PP0 => PPO, NRS => NPS
# * Added last column (Twitter tagset) and several special tags at the end
Category Examples Claws c5, Claws1 Brown PTB ICE Twitter
Adjective happy, bad AJ0 JJ JJ ADJ.ge A
@nschneid
nschneid / supersenseDefaults.py
Created February 26, 2014 21:56
Script used to load Arabic supersense lexicons (from Arabic WordNet and OntoNotes) and list the possible matches for each token of an input text. One of the imports depends on code in https://github.com/nschneid/pyutil.
#coding=UTF-8
'''
to run the code:
METHOD 1: .stem_pos files
$ export PYTHONPATH=/path/to/AQMAR
$ python2.7 supersenseDefaults.py [mode] ar.stem_pos > ar.lexiconsst
METHOD 2: parallel .tok and .wd_pos_ne.txt files
@nschneid
nschneid / iologreg.py
Created March 24, 2014 20:17
Preliminary attempt at sparse learning in creg2. Non-sparse counterpart code is included for comparison.
import numpy as np
import scipy
import random
import math
import sys
INFINITY = float('inf')
def logadd(a,b):
"""
@nschneid
nschneid / pre-commit
Created April 26, 2015 14:19
Prevent git commits that miss files included in a LaTeX project
#!/bin/bash
# Git pre-commit hook to look for untracked files mentioned in the LaTeX and BibTeX logs.
# Fail if any are found. Note that this is not foolproof, as included .tex files
# not generating any errors or warnings may not be mentioned in the log.
#
# Goes in file .git/hooks/pre-commit under the repository root.
#
# Nathan Schneider (nschneid@cs.cmu.edu), 2015-02-26
# Adapted from http://stackoverflow.com/a/10932301
#
@nschneid
nschneid / ptbpos2uni.py
Last active June 15, 2017 22:58
Given a new-style Penn Treebank English tree, produce the part-of-speech tags according to the Universal Dependencies project.
#!/usr/bin/env python2.7
'''
Converts new-style PTB POS tags to the English tagset from the Universal Dependencies project
(see universal-pos-en.html, from http://universaldependencies.github.io/docs/en/pos/all.html).
There are 17 such tags, expanded from the original 12 Universal POS tags of Petrov et al. 2011.
See "limitations" comment below for some details on our interpretation of the difficult-to-map
categories.
In new-style PTB, TO only applies to prepositional (not infinitival) "to".