Skip to content

Instantly share code, notes, and snippets.

View foowaa's full-sized avatar
:octocat:
thinking

Chunlin TIAN foowaa

:octocat:
thinking
View GitHub Profile
@foowaa
foowaa / vertibi.py
Created December 4, 2018 07:57
vertibi算法的python实现
# from hankcs
'''
求解最可能的隐状态序列是HMM的三个典型问题之一,通常用维特比算法解决。维特比算法就是求解HMM上的最短路径(-log(prob),也即是最大概率)的算法。
定义V[时间][今天天气] = 概率,注意今天天气指的是,前几天的天气都确定下来了(概率最大)今天天气是X的概率,这里的概率就是一个累乘的概率了。
因为第一天我的朋友去散步了,所以第一天下雨的概率V[第一天][下雨] = 初始概率[下雨] * 发射概率[下雨][散步] = 0.6 * 0.1 = 0.06,同理可得V[第一天][天晴] = 0.24 。从直觉上来看,因为第一天朋友出门了,她一般喜欢在天晴的时候散步,所以第一天天晴的概率比较大,数字与直觉统一了。
从第二天开始,对于每种天气Y,都有前一天天气是X的概率 * X转移到Y的概率 * Y天气下朋友进行这天这种活动的概率。因为前一天天气X有两种可能,所以Y的概率有两个,选取其中较大一个作为V[第二天][天气Y]的概率,同时将今天的天气加入到结果序列中
@foowaa
foowaa / train NN.py
Last active December 13, 2018 18:27
training procedure
'''
num_epochs: 运行epoch的轮数
train: 训练数据
dev: 验证集数据
evalp: 几轮进行一次验证
model: 模型
metric_best: 记录最佳的metric
metric_stop: 训练停止的metric
cnt_stop: 训练到dev几次不能超过metric_best就停止
'''
@foowaa
foowaa / logging.py
Last active December 19, 2018 07:49
# Level: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET
import logging
logging.basicConfig(level=logging.DEBUG,
filename="mylog.log",
format='%(asctime)s %(levelname)s %(filename)s %(funcName)s %(lineno)d %(message)s',
filemode='w')
logger = logging.getLogger(__name__)
# logging.getLogger("module_name").setLevel(logging.CRITICAL)
@foowaa
foowaa / is_generator.py
Last active December 5, 2018 16:30
decide object is generator
def is_generator(obj):
return True if iter(obj) is iter(obj) else False
'''
corpus is a list of list of string
like:
[['human', 'interface', 'computer'],
['survey', 'user', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'system'],
['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
@foowaa
foowaa / is_number.py
Created December 6, 2018 08:20
decide a string whether presents a number or not
def is_number(s):
try:
float(s)
return True
except ValueError:
pass
try:
import unicodedata
unicodedata.numeric(s)
return True
for j in range(len(s)-1, -1, -1):
pass
class LogMixin(object):
@property
def logger(self):
name = '.'.join([__name__, self.__class__.__name__])
return logging.getLogger(name)
import re
st_after = re.sub('\W', '', st_before)
import os
import torch.utils.data as data
from tqdm import tqdm
def train():
# config saving
model_path = './ckpt'
if not os.path.exists(model_path):
os.mkdir(model_path)
save_step = 2