Skip to content

Instantly share code, notes, and snippets.

View SlightlyUnorthodox's full-sized avatar

Dax Gerts SlightlyUnorthodox

View GitHub Profile
@SlightlyUnorthodox
SlightlyUnorthodox / nato_exports.R
Last active February 7, 2017 20:05
Quick analysis of proportion of US Exports attributed to NATO members
# Title: US Exports to NATO Members as Percentage of GDP
# Author: Dax Gerts
# Date: February 6th, 2017
# Description: a quick analysis of proportion of US Exports attributed to NATO members
# Load dependencies
library('openxlsx')
library('ggplot2')
# Get Census Data and parse as data frame
@SlightlyUnorthodox
SlightlyUnorthodox / python_regex_examples.py
Created October 13, 2016 19:14
Some simple examples of applications of Python's 're' library.
# The following are some uses for regular expressions in Python
# Import the regex library
import re
# Create a test string
test_string = "This is a test string with some numbers, '123456', and some letters, 'abcdef'."
# Now we have some uses
@SlightlyUnorthodox
SlightlyUnorthodox / passgen.py
Created March 7, 2016 02:26
example passphrase generation
# Passphrase generation v1.0
# Author: Dax Gerts
# Date: 6 March 2016
# Description: example passphrase generation
from nltk.corpus import words
import numpy as np
# Prompt for passphrase length and store as in
pass_length = raw_input("How many words should your pass-phrase have? (enter #) ")
@SlightlyUnorthodox
SlightlyUnorthodox / definition.py
Last active March 7, 2016 17:57
Quick and dirty defintion retrieval using WordNet with NLTK
# Definition Retrieval v1.0
# Author: Dax Gerts
# Date: 6 March 2016
# Description: example definition lookup using WordNet in NLTK
# Uses Wordnets word lookup and definition
from nltk.corpus import wordnet as wn
# Prompt for a word to define
word = raw_input("Enter a word to define: ")
@SlightlyUnorthodox
SlightlyUnorthodox / BenchmarkingWordCount2.r
Created November 19, 2015 05:33
Counts words in onegram data and sets time standard
# Title: WordCount2
# Author: Dax Gerts
# Date: 13 November 2015
# Runtime: 47.213 seconds
# Results: 4,068,566,751,420 words
require(gtBase)
data = Read(ngrams2)
@SlightlyUnorthodox
SlightlyUnorthodox / BenchmarkingWordCount1.r
Created November 19, 2015 05:33
Counts words in onegram data and sets time standard
# Title: WordCount1
# Author: Dax Gerts
# Date: 13 November 2015
# Runtime: 8.947 seconds
# Results: 1,320,305,357,364 words
require(gtBase)
data = Read(ngrams1)
@SlightlyUnorthodox
SlightlyUnorthodox / BenchmarkingTupleCount2.r
Created November 19, 2015 05:32
Counts tuples in twogram data and sets time standard
# Title: TupleCount2
# Author: Dax Gerts
# Date: 13 November 2015
# Runtime: 16.481 seconds
# Results: 37,582,158,107 tuples
require(gtBase)
data = Read(ngrams2)
@SlightlyUnorthodox
SlightlyUnorthodox / BenchmarkingTupleCount1.r
Created November 19, 2015 05:31
Counts tuples in onegram data and sets time standard
# Title: TupleCount1
# Author: Dax Gerts
# Date: 13 November 2015
# Runtime: 8.218 seconds
# Results: 1,430,731,493 tuples
require(gtBase)
data = Read(ngrams1)
@SlightlyUnorthodox
SlightlyUnorthodox / Zipf1.r
Created November 19, 2015 05:29
Verifies Zipf's Law on both one- and two-grams
# Query: Zipf1
# Author: Dax Gerts
# Date: 11 November 2015 (UPDATED)
# Runtime: (1)-gram, 10,000, 100 years, approx 77 seconds
# (2)-gram, 10,000, 100 years, approx 720 seconds
# Description: Prepare frequency table for Zipf analysis
require(gtBase)
require(gtStats)
@SlightlyUnorthodox
SlightlyUnorthodox / Collocation2008.r
Created November 19, 2015 05:27
Identifies collocations via statistical methods from one- and two-gram data
# Query: Collocation2008
# Author: Dax Gerts
# Date: 4 November 2015
# Runtime Without Segmenter: approx. 12.83 minutes (with filter at > 0.000001, one in 100K) approx no convergence (with filter at > 0.0000001, one in 100K)
# Runtime With Segmenter: approx. 12.85 minutes (with filter at > 0.000001, one in 100K) approx 14.03 minutes (with filter at > 0.0000001, one in 1M)
# Description: Attempts to identify English collocations in 2008 by comparing the relative frequency of natural occuring bigrams to independently calculated bigrams
library(gtBase)
library(methods)