Skip to content

Instantly share code, notes, and snippets.

View bogsio's full-sized avatar

George-Bogdan Ivanov bogsio

View GitHub Profile
@bogsio
bogsio / ner.py
Last active February 29, 2020 00:18
NER Python
# https://nlpforhackers.io/named-entity-extraction/
import os
import string
import collections
import pickle
from collections import Iterable
from nltk.tag import ClassifierBasedTagger
from nltk.chunk import ChunkParserI, conlltags2tree, tree2conlltags
@bogsio
bogsio / np_vectorizer.py
Created March 13, 2017 13:32
Train a NP Vectorizer script
import random
from collections import Iterable
from nltk.corpus import conll2000
from nltk import ChunkParserI, ClassifierBasedTagger
from nltk.stem.snowball import SnowballStemmer
from nltk.chunk import conlltags2tree, tree2conlltags
from nltk.tag import pos_tag
from nltk import Tree
from nltk.tokenize import sent_tokenize, word_tokenize
@bogsio
bogsio / GitHub-Forking.md
Created July 14, 2016 18:15 — forked from Chaser324/GitHub-Forking.md
GitHub Standard Fork & Pull Request Workflow

Whether you're trying to give back to the open source community or collaborating on your own projects, knowing how to properly fork and generate pull requests is essential. Unfortunately, when I started going through the process of forking and issuing pull requests, I had some trouble figuring out the proper method for doing so and made quite a few mistakes along the way. I found a lot of the information on GitHub and around the internet to be rather piecemeal and incomplete - part of the process described here, another there, common hangups in a different place, and so on.

In an attempt to coallate this information for myself and others, this short tutorial is what I've found to be fairly standard procedure for creating a fork, doing your work, issuing a pull request, and merging that pull request back into the original project.

Creating a Fork

Just head over to the GitHub page and click the "Fork" button. It's just that simple. Once you've done that, you can use your favorite git client to clone your

@bogsio
bogsio / demo.py
Created August 13, 2014 22:55
parse demo
viterbi_parser = PCFGViterbiParser.train(open('corpus.txt', 'r'), root='ROOT')
t = viterbi_parser.parse(
nltk.word_tokenize('Numerous passing references to the phrase have occurred in movies'))
print t
t.draw()
@bogsio
bogsio / parse.py
Created August 13, 2014 22:47
PCFGViterbiParser
import nltk
from nltk.grammar import WeightedProduction, Nonterminal
from util import corpus2trees, trees2productions
class PCFGViterbiParser(nltk.ViterbiParser):
def __init__(self, grammar, trace=0):
super(PCFGViterbiParser, self).__init__(grammar, trace)
@staticmethod
@bogsio
bogsio / util.py
Created August 13, 2014 22:43
Parsing utilities
from nltk import Tree
import logging
def corpus2trees(text):
""" Parse the corpus and return a list of Trees """
rawparses = text.split("\n\n")
trees = []
for rp in rawparses:
(ROOT
(S
(NP (PRP I))
(VP (VBP am)
(NP
(NP (NNP Bogdan))
(PRN (-LRB- -LRB-)
(ADJP (JJ new) (RB here))
(-RRB- -RRB-))))
(. .)))
@bogsio
bogsio / tomany.json
Created July 25, 2014 11:19
TPT3 - add tomany keys
{
"meta":{
"limit":20,
"next":null,
"offset":0,
"previous":null,
"total_count":1
},
"objects":[
{
@bogsio
bogsio / api.py
Created July 25, 2014 11:18
TPT3 - adding toMany
# ...
class TodoListResource(ModelResource):
item_urls = fields.ToManyField('todolist.api.TodoItemResource', attribute='items')
# ...
@bogsio
bogsio / fk.json
Created July 25, 2014 11:02
TPT3 - json with foreign keys
{
"meta":{
"limit":20,
"next":null,
"offset":0,
"previous":null,
"total_count":2
},
"objects":[
{