Let p(x) be the probability mass function of a random variable X over a discrte set of symbols X:
p(x) = P(X=x)
For example, if we toss two coins and count the no. of heads, we have a random variable: p(0) = 1/4, p(1) = 1/2 and p(2) = 1/4
| """Query AlchemyAPI to determine number of API calls still available""" | |
| # -*- coding: utf-8 -*- | |
| import json | |
| import requests | |
| def get_api_key(): | |
| # Load API key (40 HEX character key) from local file | |
| key = open('api_key.txt').readline().strip() | |
| return key |
| import nltk | |
| text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital | |
| computer or the gears of a cycle transmission as he does at the top of a mountain | |
| or in the petals of a flower. To think otherwise is to demean the Buddha...which is | |
| to demean oneself.""" | |
| # Used when tokenizing words | |
| sentence_re = r'''(?x) # set flag to allow verbose regexps | |
| ([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A. |
| """ | |
| Programming task | |
| ================ | |
| Implement the method iter_sample below to make the Unit test pass. iter_sample | |
| is supposed to peek at the first n elements of an iterator, and determine the | |
| minimum and maximum values (using their comparison operators) found in that | |
| sample. To make it more interesting, the method is supposed to return an | |
| iterator which will return the same exact elements that the original one would | |
| have yielded, i.e. the first n elements can't be missing. |
| """ | |
| This is a script used to clean control characters from the | |
| - NTU -Multilingual Corpus (http://web.mysites.ntu.edu.sg/fcbond/open/pubs/2012-ijalp-ntumc.pdf) | |
| - SeedLing Corpus (http://www.aclweb.org/anthology/W/W14/W14-2211.pdf) | |
| - DSL Corpus Collection (https://comparable.limsi.fr/bucc2014/4.pdf) | |
| """ | |
| import re | |
| import unicodedata | |
| # A full list of unicode characters. |
| # This script parses a dump of Bulbapedia's Pokémon pages into a JSON file | |
| # with details about what Pokémon are obtainable in respective regions | |
| # (specifically, the latest series of games set in a specific region). | |
| require 'nokogiri' | |
| require 'json' | |
| # An XML dump of all of Bulbapedia's Pokémon pages is required to exist at | |
| # this path. It can be generated using this special page: | |
| # http://bulbapedia.bulbagarden.net/wiki/Special:Export |
| #!/usr/bin/env python -*- coding: utf-8 -*- | |
| """ | |
| An implementation of the *FastNet* from | |
| Armand Joulin, Edouard Grave, Piotr Bojanowski and Tomas Mikolov. 2016. | |
| Bag of Tricks for Efficient Text Classification. | |
| https://arxiv.org/pdf/1607.01759v2.pdf | |
| Largely based on RaRe Technologies' `gensim` | |
| https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py |
| êê... i do n't ... | |
| êê look at that . look at that . | |
| êê okay , that 's good . that 's good . | |
| êê that woman ! that woman ! | |
| êê琌瑍 ︾ ┍ that 's a laundromat . | |
| êê⊿ 闽玒 . th-that 's okay . | |
| - êび - good . | |
| ê . good . | |
| êび good | |
| êタ good |
| 唐嫣 , 中国 女 演员 。 1983年 12月 6日 出生 于 上海 。 2006年 毕业于 中央 戏剧 学院 表演系 本科班 。 | |
| 2001年 获得 第三 届 舒蕾 世纪 星 比赛 全 国 总 冠军 。 2004年 被 张艺谋 钦定 为 " 奥运 宝贝 " , 参与 中国 8 分钟 的 闭幕式 表演 。 因 主演 电视剧 《 仙剑奇侠传三 》 和 《 夏 家 三千 金 》 受到 关注 。 2012年 成立 唐嫣 工作室 , 担任 其 主演 微 电影 《 逐 爱 之 旅 》 的 制作人 。 | |
| 2015年 主演 多部 热播剧 , 担任 第六 届 中国 大学生 电视节 推广 大使 和 2015 国 剧 盛典 代言人 。 2016年 主演 中 韩 合 拍片 《 赏金 猎人 》 票房 突破 两 亿 , 主演 奇幻 喜剧 电影 《 大话西游 3 》 票房 超过 3.6亿 , 同年 成为 第 11 届 中国 金鹰 电视 艺术节 金 鹰 女神 , 主演 古装 女人 权谋 剧 《 锦绣 未央 》 |