Skip to content

Instantly share code, notes, and snippets.

@Slater-Victoroff
Slater-Victoroff / gist:5235788
Created March 25, 2013 08:54
Cleaning Strings to tokens with stemming and url removal
public Set<String> parseRawString(String rawString, SnowballStemmer stemmer){
Set<String> answer = new HashSet<String>();
String[] firstSplit = rawString.split("[\\t\\n\\r]");
List<String> rawSplit = new ArrayList<String>();
for (String s: firstSplit) try{
URL url = new URL(s);
} catch (MalformedURLException e){
rawSplit.addAll(Arrays.asList(s.split("[\\p{P}]")));
}
for (String s: rawSplit){
@Slater-Victoroff
Slater-Victoroff / Synonym_checker
Created June 5, 2013 13:23
General case synonym matching using nltk.
from nltk.corpus import wordnet
from nltk.stem.wordnet import WordNetLemmatizer
import itertools
def Synonym_Checker(word1, word2):
"""Checks if word1 and word2 and synonyms. Returns True if they are, otherwise False"""
equivalence = WordNetLemmatizer()
word1 = equivalence.lemmatize(word1)
@Slater-Victoroff
Slater-Victoroff / PyGrep
Created June 28, 2013 18:22
Simple grepping for files in python in a nice useful way.
import os
class PyGrep:
def __init__(self, directory):
self.directory = directory
def grab_all_files_with_ending(self, file_ending):
"""Will return absolute paths to all files with given file ending in self.directory"""
walk_results = os.walk(self.directory)
@Slater-Victoroff
Slater-Victoroff / ElasticEnchant
Created June 28, 2013 18:23
Pyenchant spell checking with ElasticSearch
class ElasticEnchant:
def __init__(self, esDatabase):
self.es_instance = esDatabase
def produce_dictionary(self, output_file, **kwargs):
"""Produces a dictionary or updates it depending on kwargs
If no kwargs are given then this method will write a full dictionary including all
entries in all indices and types and output it in an enchant-friendly way to the output file.
from flask import Blueprint, request, redirect, render_template, url_for
from flask.views import MethodView
from flask.ext.mongoengine.wtf import model_form
from SpoolEngine.auth import requires_auth
from SpoolEngine.models import Post, BlogPost, Video, Image, Quote, Comment
admin = Blueprint('admin', __name__, template_folder='templates')
@Slater-Victoroff
Slater-Victoroff / gist:6156734
Created August 5, 2013 15:19
ElasticSearch Settings file
{
"mappings": {
"properties": {
"searchable_text": {
"type": "multi_field",
"fields": {
"full_words": {
"type": "string",
"store": "yes",
"index": "analyzed",
@Slater-Victoroff
Slater-Victoroff / gist:6193947
Created August 9, 2013 14:19
The most functional line of python
def random_document(document_layout, type, **kwargs):
document_layout.append("dummy_item")
doc = lambda l:{l[i]:doc(l[i+1]) if isinstance(l[i+1],list) else random_item(type,**kwargs) for i in range(len(l)-1)}
return doc(document_layout)
@Slater-Victoroff
Slater-Victoroff / PyMarkov
Last active March 28, 2022 13:55
Arbitrary ply markov constructor in python
from collections import Counter
import cPickle as pickle
import random
import itertools
import string
def words(entry):
return [word.lower().decode('ascii', 'ignore') for word in entry.split()]
def letters(entry):
@Slater-Victoroff
Slater-Victoroff / gist:6299996
Created August 21, 2013 20:42
Determine the most random language
from pyfuzz.generator import random_regex
from guess_language import guessLanguageName
from collections import Counter
def randomness_histogram(iterations, length):
return Counter((guessLanguageName(random_regex(regex="[a-z \n]", length=length)) for i in xrange(length)))
print randomness_histogram(1000, 200).most_common(10) # returns the 10 most common languages
@Slater-Victoroff
Slater-Victoroff / gist:6681032
Last active December 23, 2015 19:09
Simple Flask Server for parsing LabView XML and doing some scant logic.
import flask
import sys
from flask import Flask, request
import random
from lxml import etree
import xmltodict
import numpy as np
import operator