Skip to content

Instantly share code, notes, and snippets.

@zhuth
zhuth / gist:3413534
Created August 21, 2012 08:34
Matlab interface for gmeans
function [L C] = gmeans(datafile, clusters)
if ~ischar(datafile),
save('~gmeans.prep', 'datafile', '-ascii');
datafile='~gmeans.prep';
end;
tool_bin = '..\..\tools\bin\';
system([tool_bin 'LDAInputConverter.exe lsa2gmeans "' datafile '" "~gmeans.txt" > nul']);
system([tool_bin 'gmeans.exe -F t -c ' num2str(clusters) ' "~gmeans.txt" > nul']);
data=importdata(['~gmeans.txt_doctoclus.' num2str(clusters)]);
n=data(1);
@zhuth
zhuth / WordDetectorMain.cs
Created September 1, 2012 08:14
尝试用这篇post: http://www.matrix67.com/blog/archives/5044 中的方法实现的一个自动中文抽词算法的 C# 程序,大量参考了 @lastland 兄的 Python 实现,速度应该可以快一点。
// 这是一个 CsSC 脚本文件,要使用标准 C#,请去掉 #function 和 #endfunction 这两个指令,并将其余代码放在 Main 函数中,接受 string[] args 为其参数。
// 要直接运行,请移步 https://github.com/zhuth/Tools/ ,下载后解压 bin\CsSC.exe 和相关文件。
#reference System.Core.dll
#function
public class WordDetector {
public Action ProcessOver = null;
@zhuth
zhuth / gist:3755629
Created September 20, 2012 12:37
use Bayesian average to find out hot words from frequency data
// written in CsSC
#reference System.Core.dll;
#using System.Linq;
var names = Directory.GetFiles("D:\\temp\\fq\\", "*.fq");
var words = new Dictionary<string, double>();
var ps = new Dictionary<string, double>();
var total_words = new Dictionary<string, int>();
@zhuth
zhuth / gj.csv
Created October 5, 2012 04:04
Random walk on bus network.
// This is a CsSC script.
int STEPS = 33;
string[] lines = File.ReadAllLines("gj.csv", Encoding.GetEncoding("GBK"));
Dictionary<string, List<string>> dc = new Dictionary<string, List<string>>();
Dictionary<string, List<string>> dx = new Dictionary<string, List<string>>();
foreach(string line in lines) {
string[] cols = line.Split('\t', ',');
string name = cols[0];
@zhuth
zhuth / douban_book_estimator.user.js
Created November 5, 2012 06:06
豆瓣读书“山倒-抽丝-即焚”与预计读完日期显示
// ==UserScript==
// @name douban book estimator
// @description douban book reading
// @namespace http://tianhua.me/
// @auth break
// @version 0.122
// @license Public Domain
// @include http://www.douban.com/*
// @include http://book.douban.com/*
// ==/UserScript==
@zhuth
zhuth / gist:4058466
Created November 12, 2012 10:05
古代文献藏书133337种-01部分-自动词频
东坡 19
元亨 683
利贞 1391
初九 1585
潜龙 253
潜龙勿用 142
勿用 703
非天下之至 74
天下 8634
其孰能 96
@zhuth
zhuth / gist:4086330
Created November 16, 2012 10:42
人民日报199801语料库自动词典构造结果与手工比较
-1-自动 1-手工 0-均有
计划生育 1
深夜 1
家家 1
区分 1
一线 1
东北部 1
物价局 1
后来 1
青岛市 1
@zhuth
zhuth / gist:4108587
Created November 19, 2012 02:10
Run pdflatex, then bibtex, then pdflatex twice to get correct pdf output. Used for texworks. CsSC script.
var pdflatex = @"D:\Programs\CTEX\MiKTeX\miktex\bin\pdflatex.exe";
var bibtex = @"D:\Programs\CTEX\MiKTeX\miktex\bin\bibtex.exe";
string[] argpdf = new string[args.Length - 1];
for (int i = 0; i < args.Length - 1; ++i) argpdf[i] = args[i];
start(pdflatex, argpdf);
start(bibtex, new string[]{args[args.Length - 1]});
start(pdflatex, argpdf);
start(pdflatex, argpdf);
@zhuth
zhuth / doubanbib.user.js
Created November 26, 2012 05:24
豆瓣读书BibTex简易生成插件(Chrome/FF+GM)
// ==UserScript==
// @name douban book bibtex generator
// @description douban book bibtex generator
// @namespace http://tianhua.me/
// @auth zhuth
// @version 0.1
// @license Public Domain
// @include http://book.douban.com/subject/*
// ==/UserScript==
@zhuth
zhuth / 亲属关系知识库.txt
Created December 6, 2012 06:07
亲属关系知识库
#知识库格式:
# 注释以 # 开头
# 谓词定义:谓词名称(参数1, 参数2) := [@临时变元] 其他谓词
# 注意:谓词 R(x, y) 读作“x是y的R”。
# 临时变元列表以,分割,中间不能有空格。
#原子谓词定义
elder(x, y) := atom
child(x, y) := atom
marry(x, y) := atom