This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Solarized for Mosh Chrome | |
// Mosh does not seem to save profiles, so this sets the default 'mosh' profile. | |
// To reset, evaluate term_.prefs_.resetAll() | |
// Run in the JavaScript console of mosh_browser.html, which can be opened | |
// as explained here: | |
// https://github.com/rpwoodbu/mosh-chrome/wiki/FAQ#how-can-i-change-the-way-the-terminal-looks-font-color-etc | |
var htermProfiles = { | |
// Solarized Dark |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"origin": ["I accuse #suspect# of committing the crime in the #room# with the #weapon#!"], | |
"suspect": ["Miss Scarlett", "Colonel Mustard", "Mrs. White", "Reverend Green", "Mrs. Peacock", "Professor Plum", "Miss Peach", "Monsieur Brunette", "Madame Rose", " Sergeant Gray"], | |
"room": ["kitchen", "ballroom", "conservatory", "dining room", "billiard room", "library", "study", "hall", "lounge"], | |
"weapon": ["candlestick", "knife", "lead pipe", "dagger", "revolver", "rope", "wrench"] | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
READPY=$(git log --name-only HEAD^.. | grep "^README.ipynb$") | |
READMD=$(git log --name-only HEAD^.. | grep "^README.md$") | |
if [ -n "$READPY" ] && [ -z "$READMD" ]; then | |
echo "It looks like a new README was committed, appending a Markdown version" | |
ipython nbconvert --to markdown README.ipynb | |
# Adding this file doesn't work in pre-commit hooks, which is | |
# why we're appending post-commit | |
git add README.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Author: Peter Organisciak | |
Convert Day of DH (or other Wordpress) export to Mallet import format. | |
[url] [user] [post text] | |
Use in the following way: | |
>> python process.py input-file output-file --split [post|author] | |
For the split argument, choose either post (a document representation is the words of a post) or author (a document representation is the words that an author has written). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from htrc_features import FeatureReader | |
import argparse | |
import pandas as pd | |
import numpy as np | |
import random | |
import string | |
def main(): | |
parser = argparse.ArgumentParser(description='Calculate Collection ' |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tl = vol.tokenlist(pages=False) | |
just_nouns = tl.loc[(slice(None), slice(None), ["NN", "NNS"]),] | |
top_nouns = just_nouns.sort_values('count', ascending=False) | |
top_nouns.head(5) | |
# OUTPUT: | |
# count | |
# section token pos | |
# body doctor NN 83 | |
# time NN 80 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Field Coverage Description | |
035 100% SYSTEM CONTROL NUMBER (R) | |
245 100% TITLE STATEMENT (NR) | |
538 100% SYSTEM DETAILS NOTE (R) | |
974 100% NA | |
260 99.7% PUBLICATION, DISTRIBUTION, ETC. (IMPRINT) (R) | |
300 99.0% PHYSICAL DESCRIPTION (R) | |
040 95.8% CATALOGING SOURCE (NR) | |
100 73.9% MAIN ENTRY--PERSONAL NAME (NR) | |
650 64.9% SUBJECT ADDED ENTRY--TOPICAL TERM (R) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def calculate_tfidf(tokencounts, idf_df, df='PF', case=True, log_tf=True): | |
'''Takes a 'token, count' DF and returns TF*IDF weights ''' | |
if not case: | |
tc['token'] = tc['token'].str.lower() | |
tc = tc.groupby('token', as_index=False).sum() | |
tfidf = pd.merge(tc.set_index('token'), idf_df, left_index=True, right_index=True) | |
if log_tf: | |
tfidf['TF'] = tfidf['count'].add(1).apply(np.log10) | |
else: | |
tfidf['TF'] = tfidf['count'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Quick declination of the head. | |
Free from weeds. | |
Thoroughfare: way. | |
Raw, unprepared. | |
Enclosed place. | |
Railway station. | |
Effect of commutation. | |
Small bit of bread. | |
Machine-made net or lace. |