Skip to content

Instantly share code, notes, and snippets.

"""Query AlchemyAPI to determine number of API calls still available"""
# -*- coding: utf-8 -*-
import json
import requests
def get_api_key():
# Load API key (40 HEX character key) from local file
key = open('api_key.txt').readline().strip()
return key
@alvations
alvations / nltk-intro.py
Created October 1, 2015 12:58 — forked from alexbowe/nltk-intro.py
Demonstration of extracting key phrases with NLTK in Python
import nltk
text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
computer or the gears of a cycle transmission as he does at the top of a mountain
or in the petals of a flower. To think otherwise is to demean the Buddha...which is
to demean oneself."""
# Used when tokenizing words
sentence_re = r'''(?x) # set flag to allow verbose regexps
([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A.
"""
Programming task
================
Implement the method iter_sample below to make the Unit test pass. iter_sample
is supposed to peek at the first n elements of an iterator, and determine the
minimum and maximum values (using their comparison operators) found in that
sample. To make it more interesting, the method is supposed to return an
iterator which will return the same exact elements that the original one would
have yielded, i.e. the first n elements can't be missing.
"""
Programming task
================
The following is an implementation of a simple Named Entity Recognition (NER).
NER is concerned with identifying place names, people names or other special
identifiers in text.
Here we make a very simple definition of a named entity: A sequence of
at least two consecutive capitalized words. E.g. "Los Angeles" is a named

Entropy and WSD.

Let p(x) be the probability mass function of a random variable X over a discrte set of symbols X:

p(x) = P(X=x)

For example, if we toss two coins and count the no. of heads, we have a random variable: p(0) = 1/4, p(1) = 1/2 and p(2) = 1/4

"""
This is a script used to clean control characters from the
- NTU -Multilingual Corpus (http://web.mysites.ntu.edu.sg/fcbond/open/pubs/2012-ijalp-ntumc.pdf)
- SeedLing Corpus (http://www.aclweb.org/anthology/W/W14/W14-2211.pdf)
- DSL Corpus Collection (https://comparable.limsi.fr/bucc2014/4.pdf)
"""
import re
import unicodedata
# A full list of unicode characters.

NLTK API to Stanford NLP Tools compiled on 2015-12-09

Stanford NER

With NLTK version 3.1 and Stanford NER tool 2015-12-09, it is possible to hack the StanfordNERTagger._stanford_jar to include other .jar files that are necessary for the new tagger.

First set up the environment variables as per instructed at https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software

@alvations
alvations / BLEU.py
Last active January 22, 2021 16:40
# -*- coding: utf-8 -*-
"""BLEU.
Usage:
bleu.py --reference FILE --translation FILE [--weights STR] [--smooth STR] [--smooth-epsilon STR] [--smooth-alpha STR] [--smooth-k STR] [--segment-level]
bleu.py -r FILE -t FILE [-w STR] [--smooth STR] [--segment-level]
Options:
-h --help Show this screen.
from nltk.corpus import wordnet as wn
from nltk.stem import PorterStemmer, WordNetLemmatizer
#from nltk import pos_tag, word_tokenize
# Pywsd's Lemmatizer.
porter = PorterStemmer()
wnl = WordNetLemmatizer()
from nltk.tag import PerceptronTagger

Getting Stanford NLP and MaltParser to work in NLTK for Windows Users

Firstly, I strongly think that if you're working with NLP/ML/AI related tools, getting things to work on Linux and Mac OS is much easier and save you quite a lot of time.

Disclaimer: I am not affiliated with Continuum (conda), Git, Java, Windows OS or Stanford NLP or MaltParser group. And the steps presented below is how I, IMHO, would setup a Windows computer if I own one.

Please please please understand the solution don't just copy and paste!!! We're not monkeys typing Shakespeare ;P