Skip to content

Instantly share code, notes, and snippets.

"""Query AlchemyAPI to determine number of API calls still available"""
# -*- coding: utf-8 -*-
import json
import requests
def get_api_key():
# Load API key (40 HEX character key) from local file
key = open('api_key.txt').readline().strip()
return key
@alvations
alvations / nltk-intro.py
Created October 1, 2015 12:58 — forked from alexbowe/nltk-intro.py
Demonstration of extracting key phrases with NLTK in Python
import nltk
text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
computer or the gears of a cycle transmission as he does at the top of a mountain
or in the petals of a flower. To think otherwise is to demean the Buddha...which is
to demean oneself."""
# Used when tokenizing words
sentence_re = r'''(?x) # set flag to allow verbose regexps
([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A.
"""
Programming task
================
Implement the method iter_sample below to make the Unit test pass. iter_sample
is supposed to peek at the first n elements of an iterator, and determine the
minimum and maximum values (using their comparison operators) found in that
sample. To make it more interesting, the method is supposed to return an
iterator which will return the same exact elements that the original one would
have yielded, i.e. the first n elements can't be missing.

Entropy and WSD.

Let p(x) be the probability mass function of a random variable X over a discrte set of symbols X:

p(x) = P(X=x)

For example, if we toss two coins and count the no. of heads, we have a random variable: p(0) = 1/4, p(1) = 1/2 and p(2) = 1/4

"""
This is a script used to clean control characters from the
- NTU -Multilingual Corpus (http://web.mysites.ntu.edu.sg/fcbond/open/pubs/2012-ijalp-ntumc.pdf)
- SeedLing Corpus (http://www.aclweb.org/anthology/W/W14/W14-2211.pdf)
- DSL Corpus Collection (https://comparable.limsi.fr/bucc2014/4.pdf)
"""
import re
import unicodedata
# A full list of unicode characters.

Euclidean Distance vs Cosine Similarity (Time)

import time
import numpy as np

for i in range(10):
	start = time.time() 
	for i in range(10000):
		a, b = np.random.rand(100), np.random.rand(100) 
@alvations
alvations / bulba-parser.rb
Created July 3, 2016 16:04 — forked from meew0/bulba-parser.rb
Ruby script to parse a dump of Bulbapedia's Pokémon pages into obtainability data
# This script parses a dump of Bulbapedia's Pokémon pages into a JSON file
# with details about what Pokémon are obtainable in respective regions
# (specifically, the latest series of games set in a specific region).
require 'nokogiri'
require 'json'
# An XML dump of all of Bulbapedia's Pokémon pages is required to exist at
# this path. It can be generated using this special page:
# http://bulbapedia.bulbagarden.net/wiki/Special:Export
#!/usr/bin/env python -*- coding: utf-8 -*-
"""
An implementation of the *FastNet* from
Armand Joulin, Edouard Grave, Piotr Bojanowski and Tomas Mikolov. 2016.
Bag of Tricks for Efficient Text Classification.
https://arxiv.org/pdf/1607.01759v2.pdf
Largely based on RaRe Technologies' `gensim`
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py
êê... i do n't ...
êê look at that . look at that .
êê okay , that 's good . that 's good .
êê that woman ! that woman !
êê琌瑍 ︾ ┍ that 's a laundromat .
êê⊿ 闽玒 . th-that 's okay .
- êび - good .
ê  . good .
êび good
êタ good
唐嫣 , 中国 女 演员 。 1983年 12月 6日 出生 于 上海 。 2006年 毕业于 中央 戏剧 学院 表演系 本科班 。
2001年 获得 第三 届 舒蕾 世纪 星 比赛 全 国 总 冠军 。 2004年 被 张艺谋 钦定 为 " 奥运 宝贝 " , 参与 中国 8 分钟 的 闭幕式 表演 。 因 主演 电视剧 《 仙剑奇侠传三 》 和 《 夏 家 三千 金 》 受到 关注 。 2012年 成立 唐嫣 工作室 , 担任 其 主演 微 电影 《 逐 爱 之 旅 》 的 制作人 。
2015年 主演 多部 热播剧 , 担任 第六 届 中国 大学生 电视节 推广 大使 和 2015 国 剧 盛典 代言人 。 2016年 主演 中 韩 合 拍片 《 赏金 猎人 》 票房 突破 两 亿 , 主演 奇幻 喜剧 电影 《 大话西游 3 》 票房 超过 3.6亿 , 同年 成为 第 11 届 中国 金鹰 电视 艺术节 金 鹰 女神 , 主演 古装 女人 权谋 剧 《 锦绣 未央 》