Skip to content

Instantly share code, notes, and snippets.

@thm1118
Last active July 15, 2016 13:04
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thm1118/43df6fa4c62818d709dd to your computer and use it in GitHub Desktop.
Save thm1118/43df6fa4c62818d709dd to your computer and use it in GitHub Desktop.
测试向量不同距离算法。以sklearn和scipy提供的算法测试
# -*- coding: utf-8 -*-
"""
使用sklearn 提供的距离函数测试,对 'euclidean', 'l2', 'l1', 'manhattan', 'cityblock',
'braycurtis', 'canberra', 'chebyshev', 'correlation',
'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski',
'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto',
'russellrao', 'seuclidean', 'sokalmichener',
'sokalsneath', 'sqeuclidean', 'yule', "wminkowski"
pearson 等20几个度量距离算法进行逐一测试。
向量空间模型主要使用tfidf
测试结果:
braycurtis,chebyshev,correlation,余玄夹角,dice,sokalsneath ,sqeuclidean,皮尔逊 除了能判定雷同外,还能明显排除完全不同。
jaccard---只能判断完全雷同。欧式和曼哈顿等虽然能判定雷同,但不能明显隔开完全不同。其他算法就只有相似度,无法判定雷同
"""
from sklearn.metrics import pairwise
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
import jieba
import numpy as np
import time
print __doc__
jieba_tokenizer = lambda s: list(jieba.cut(s))
corpus = [
u'Writing II: Rhetorical Composing',
u'Genetics and Society: A Course for Educators',
u'General Game Playing',
u'Genes and the Human Condition (From Behavior to Biotechnology)',
u'A Brief History of Humankind',
u'New Models of Business in Society',
u'Analyse Numrique pour Ingnieurs',
u'Evolution: A Course for Educators',
u'Coding the Matrix: Linear Algebra through Computer Science Applications',
u'The Dynamic Earth: A Course for Educators',
u'Tiny Wings\tYou have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can. ',
u'Angry Birds Free',
u'没有\它很相似',
u'没有\t它很相似',
u'没有\t他很相似',
u'没有\t他不很相似',
u'没有',
u'可以没有',
u'也没有',
u'有没有也不管',
u'Angry Birds Stella',
u'Flappy Wings - FREE\tFly into freedom!A parody of the #1 smash hit game!',
u'没有一个',
u'没有一个2',
]
tfidvectorizer = TfidfVectorizer(tokenizer=jieba_tokenizer)
X = tfidvectorizer.fit_transform(corpus)
print X.shape, u'\n 特征词表', np.reshape(tfidvectorizer.get_feature_names(), (10, 12)), u'\n 向量空间模型', X.toarray()
test = [u'没有']
test_vec = tfidvectorizer.transform(test)
print u'\n 测试:', test_vec.shape, u'\n test向量', test_vec.toarray()
_VALID_METRICS = ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock',
'braycurtis', 'canberra', 'chebyshev', 'correlation',
'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski',
'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto',
'russellrao', 'seuclidean', 'sokalmichener',
'sokalsneath', 'sqeuclidean', 'yule', "wminkowski"]
# 开始比较
''' sklearn自己实现了支持稀疏矩阵 三种度量:欧几里得、曼哈顿、余弦。另外,支持这三个算法的别名l2,l1,cityblock 。
其余度量是sklearn调用scipy,不支持稀疏矩阵,是基于稠密矩阵的
'''
metrics_for_sparse = ['euclidean', 'manhattan', 'cosine', 'l2', 'l1', 'cityblock']
start = time.time()
#for metric in pairwise.distance_metrics().keys():
for metric in pairwise._VALID_METRICS:
# 这两个算法报错
if metric in ['mahalanobis', 'wminkowski']: continue
if metric in metrics_for_sparse:
mx = X
y = test_vec
else:
mx = X.todense()
y = test_vec.todense()
print u'\n ==========使用距离算法:', metric, "==========="
D = pairwise.pairwise_distances(mx, y, metric=metric)
dlist = [(i, d[0]) for i, d in enumerate(D)]
dlist.sort(key=lambda item: item[1])
for i, d in dlist:
print u'距离:', d, u' ', corpus[i]
print u'\n 消耗时间:', time.time()-start
'''
使用皮尔逊相关系数,这时候不一定使用tfidf,因为皮尔逊解决就是样本数据落差过大问题。自己会寻找中间拟合线
'''
from scipy.stats import pearsonr
countvectorizer = CountVectorizer(tokenizer=jieba_tokenizer)
X = countvectorizer.fit_transform(corpus)
print X.shape, u'\n 特征词表', np.reshape(tfidvectorizer.get_feature_names(), (10, 12)), u'\n 向量空间模型', X.toarray()
test_vec = countvectorizer.transform(test)
print u'\n 测试:', test_vec.shape, u'\n test向量', test_vec.toarray()
dicts = {}
for i, row in enumerate(X):
rvalue = pearsonr(row.toarray().ravel(), test_vec.toarray().ravel())
dicts[i] = rvalue
dlist = [(i, dicts[i][0], dicts[i][1]) for i in dicts.keys()]
dlist.sort(key=lambda item: item[1], reverse=True)
print u'\n========使用皮尔逊相关系数 ======================'
for i, oddsratio, pvalue in dlist:
print u'让步率:', oddsratio, u' ', pvalue, ' ', corpus[i]
print
''' 对矩阵内的行进行两两距离计算,返回 相关系数(n_samples,n_samples矩阵)'''
from pylab import *
from pprint import pprint
def arrayToList(arr):
if type(arr) == type(array([])):
return arrayToList(arr.tolist())
elif type([]) == type(arr):
return [arrayToList(a) for a in arr]
else:
return '{:.2f}'.format(arr)
def prettyArray(arr):
pprint(arrayToList(arr), width=800)
D = pairwise.pairwise_distances(X)
np.set_printoptions(precision=3)
prettyArray(D)
@thm1118
Copy link
Author

thm1118 commented Mar 4, 2015

==========使用距离算法: euclidean ===========
距离: 0.0 没有
距离: 0.97639095729 也没有
距离: 0.97639095729 没有一个
距离: 1.02221419296 可以没有
距离: 1.11625739566 没有一个2
距离: 1.15412177776 没有 它很相似
距离: 1.15412177776 没有 他很相似
距离: 1.17892433254 没有\它很相似
距离: 1.19618985384 没有 他不很相似
距离: 1.41421356237 General Game Playing
距离: 1.41421356237 Angry Birds Free
距离: 1.41421356237 Writing II: Rhetorical Composing
距离: 1.41421356237 Genetics and Society: A Course for Educators
距离: 1.41421356237 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.41421356237 A Brief History of Humankind
距离: 1.41421356237 New Models of Business in Society
距离: 1.41421356237 Analyse Numrique pour Ingnieurs
距离: 1.41421356237 Evolution: A Course for Educators
距离: 1.41421356237 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.41421356237 The Dynamic Earth: A Course for Educators
距离: 1.41421356237 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.41421356237 有没有也不管
距离: 1.41421356237 Angry Birds Stella
距离: 1.41421356237 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: l2 ===========
距离: 0.0 没有
距离: 0.97639095729 也没有
距离: 0.97639095729 没有一个
距离: 1.02221419296 可以没有
距离: 1.11625739566 没有一个2
距离: 1.15412177776 没有 它很相似
距离: 1.15412177776 没有 他很相似
距离: 1.17892433254 没有\它很相似
距离: 1.19618985384 没有 他不很相似
距离: 1.41421356237 General Game Playing
距离: 1.41421356237 Angry Birds Free
距离: 1.41421356237 Writing II: Rhetorical Composing
距离: 1.41421356237 Genetics and Society: A Course for Educators
距离: 1.41421356237 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.41421356237 A Brief History of Humankind
距离: 1.41421356237 New Models of Business in Society
距离: 1.41421356237 Analyse Numrique pour Ingnieurs
距离: 1.41421356237 Evolution: A Course for Educators
距离: 1.41421356237 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.41421356237 The Dynamic Earth: A Course for Educators
距离: 1.41421356237 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.41421356237 有没有也不管
距离: 1.41421356237 Angry Birds Stella
距离: 1.41421356237 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: l1 ===========
距离: 0.0 没有
距离: 1.32879953846 也没有
距离: 1.32879953846 没有一个
距离: 1.40107144189 可以没有
距离: 1.93045648815 没有一个2
距离: 2.54251188821 没有 它很相似
距离: 2.54251188821 没有 他很相似
距离: 2.58378870985 没有\它很相似
距离: 2.72930515794 有没有也不管
距离: 2.83776301225 没有 他不很相似
距离: 2.99515330809 General Game Playing
距离: 2.9963267553 Angry Birds Stella
距离: 2.99980975696 Angry Birds Free
距离: 3.22312907644 Analyse Numrique pour Ingnieurs
距离: 3.32156170743 A Brief History of Humankind
距离: 3.41353909228 Writing II: Rhetorical Composing
距离: 3.46604251153 New Models of Business in Society
距离: 3.47649793573 Evolution: A Course for Educators
距离: 3.59142837483 The Dynamic Earth: A Course for Educators
距离: 3.60599308735 Genetics and Society: A Course for Educators
距离: 3.81445339933 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 3.96660175197 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 4.22776125682 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 4.35438098566 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: manhattan ===========
距离: 0.0 没有
距离: 1.32879953846 也没有
距离: 1.32879953846 没有一个
距离: 1.40107144189 可以没有
距离: 1.93045648815 没有一个2
距离: 2.54251188821 没有 它很相似
距离: 2.54251188821 没有 他很相似
距离: 2.58378870985 没有\它很相似
距离: 2.72930515794 有没有也不管
距离: 2.83776301225 没有 他不很相似
距离: 2.99515330809 General Game Playing
距离: 2.9963267553 Angry Birds Stella
距离: 2.99980975696 Angry Birds Free
距离: 3.22312907644 Analyse Numrique pour Ingnieurs
距离: 3.32156170743 A Brief History of Humankind
距离: 3.41353909228 Writing II: Rhetorical Composing
距离: 3.46604251153 New Models of Business in Society
距离: 3.47649793573 Evolution: A Course for Educators
距离: 3.59142837483 The Dynamic Earth: A Course for Educators
距离: 3.60599308735 Genetics and Society: A Course for Educators
距离: 3.81445339933 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 3.96660175197 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 4.22776125682 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 4.35438098566 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: cityblock ===========
距离: 0.0 没有
距离: 1.32879953846 也没有
距离: 1.32879953846 没有一个
距离: 1.40107144189 可以没有
距离: 1.93045648815 没有一个2
距离: 2.54251188821 没有 它很相似
距离: 2.54251188821 没有 他很相似
距离: 2.58378870985 没有\它很相似
距离: 2.72930515794 有没有也不管
距离: 2.83776301225 没有 他不很相似
距离: 2.99515330809 General Game Playing
距离: 2.9963267553 Angry Birds Stella
距离: 2.99980975696 Angry Birds Free
距离: 3.22312907644 Analyse Numrique pour Ingnieurs
距离: 3.32156170743 A Brief History of Humankind
距离: 3.41353909228 Writing II: Rhetorical Composing
距离: 3.46604251153 New Models of Business in Society
距离: 3.47649793573 Evolution: A Course for Educators
距离: 3.59142837483 The Dynamic Earth: A Course for Educators
距离: 3.60599308735 Genetics and Society: A Course for Educators
距离: 3.81445339933 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 3.96660175197 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 4.22776125682 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 4.35438098566 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: braycurtis ===========
距离: 0.0 没有
距离: 0.559386142429 也没有
距离: 0.559386142429 没有一个
距离: 0.594644521067 可以没有
距离: 0.719131966923 没有一个2
距离: 0.791932770422 没有 它很相似
距离: 0.791932770422 没有 他很相似
距离: 0.808969464636 没有\它很相似
距离: 0.832947539201 没有 他不很相似
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: canberra ===========
距离: 0.0 没有
距离: 1.31291285634 也没有
距离: 1.31291285634 没有一个
距离: 1.35360210644 可以没有
距离: 2.45244894926 没有一个2
距离: 4.0 有没有也不管
距离: 4.49924873277 没有 它很相似
距离: 4.49924873277 没有 他很相似
距离: 4.53248636344 没有\它很相似
距离: 5.0 General Game Playing
距离: 5.0 Angry Birds Free
距离: 5.0 Angry Birds Stella
距离: 5.5569473943 没有 他不很相似
距离: 6.0 Analyse Numrique pour Ingnieurs
距离: 7.0 Writing II: Rhetorical Composing
距离: 7.0 A Brief History of Humankind
距离: 8.0 New Models of Business in Society
距离: 8.0 Evolution: A Course for Educators
距离: 10.0 Genetics and Society: A Course for Educators
距离: 10.0 The Dynamic Earth: A Course for Educators
距离: 12.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 13.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 20.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!
距离: 55.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.

==========使用距离算法: chebyshev ===========
距离: 0.0 没有
距离: 0.665998538949 没有 它很相似
距离: 0.665998538949 没有 他很相似
距离: 0.693603418355 没有一个2
距离: 0.694931290928 没有\它很相似
距离: 0.715435083213 没有 他不很相似
距离: 0.852129887718 也没有
距离: 0.852129887718 没有一个
距离: 0.87861051374 可以没有
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: correlation ===========
距离: 0.0 没有
距离: 0.481885149749 也没有
距离: 0.481885149749 没有一个
距离: 0.528177713113 可以没有
距离: 0.631143647492 没有一个2
距离: 0.676440193223 没有 它很相似
距离: 0.676440193223 没有 他很相似
距离: 0.706056031627 没有\它很相似
距离: 0.727730114443 没有 他不很相似
距离: 1.01465505955 有没有也不管
距离: 1.01697999481 General Game Playing
距离: 1.01699032453 Angry Birds Stella
距离: 1.01702098869 Angry Birds Free
距离: 1.01899911448 Analyse Numrique pour Ingnieurs
距离: 1.01987901353 A Brief History of Humankind
距离: 1.02070597436 Writing II: Rhetorical Composing
距离: 1.02118018228 New Models of Business in Society
距离: 1.02127480707 Evolution: A Course for Educators
距离: 1.02231929564 The Dynamic Earth: A Course for Educators
距离: 1.02245224048 Genetics and Society: A Course for Educators
距离: 1.02437018042 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.02578901722 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.02826566385 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.02948682495 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: cosine ===========
距离: 0.0 没有
距离: 0.476669650739 也没有
距离: 0.476669650739 没有一个
距离: 0.522460928147 可以没有
距离: 0.623015286688 没有一个2
距离: 0.665998538949 没有 它很相似
距离: 0.665998538949 没有 他很相似
距离: 0.694931290928 没有\它很相似
距离: 0.715435083213 没有 他不很相似
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: dice ===========
距离: 0.0 没有
距离: 0.333333333333 可以没有
距离: 0.333333333333 也没有
距离: 0.333333333333 没有一个
距离: 0.5 没有一个2
距离: 0.666666666667 没有\它很相似
距离: 0.666666666667 没有 它很相似
距离: 0.666666666667 没有 他很相似
距离: 0.714285714286 没有 他不很相似
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: hamming ===========
距离: 0.0 没有
距离: 0.0166666666667 可以没有
距离: 0.0166666666667 也没有
距离: 0.0166666666667 没有一个
距离: 0.025 没有一个2
距离: 0.0333333333333 有没有也不管
距离: 0.0416666666667 General Game Playing
距离: 0.0416666666667 Angry Birds Free
距离: 0.0416666666667 没有\它很相似
距离: 0.0416666666667 没有 它很相似
距离: 0.0416666666667 没有 他很相似
距离: 0.0416666666667 Angry Birds Stella
距离: 0.05 Analyse Numrique pour Ingnieurs
距离: 0.05 没有 他不很相似
距离: 0.0583333333333 Writing II: Rhetorical Composing
距离: 0.0583333333333 A Brief History of Humankind
距离: 0.0666666666667 New Models of Business in Society
距离: 0.0666666666667 Evolution: A Course for Educators
距离: 0.0833333333333 Genetics and Society: A Course for Educators
距离: 0.0833333333333 The Dynamic Earth: A Course for Educators
距离: 0.1 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 0.108333333333 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 0.166666666667 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!
距离: 0.458333333333 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.

==========使用距离算法: jaccard ===========
距离: 0.0 没有
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 没有\它很相似
距离: 1.0 没有 它很相似
距离: 1.0 没有 他很相似
距离: 1.0 没有 他不很相似
距离: 1.0 可以没有
距离: 1.0 也没有
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!
距离: 1.0 没有一个
距离: 1.0 没有一个2

==========使用距离算法: kulsinski ===========
距离: 0.991666666667 没有
距离: 0.99173553719 可以没有
距离: 0.99173553719 也没有
距离: 0.99173553719 没有一个
距离: 0.991803278689 没有一个2
距离: 0.991935483871 没有\它很相似
距离: 0.991935483871 没有 它很相似
距离: 0.991935483871 没有 他很相似
距离: 0.992 没有 他不很相似
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: matching ===========
距离: 0.0 没有
距离: 0.00833333333333 可以没有
距离: 0.00833333333333 也没有
距离: 0.00833333333333 没有一个
距离: 0.0166666666667 没有一个2
距离: 0.0333333333333 没有\它很相似
距离: 0.0333333333333 没有 它很相似
距离: 0.0333333333333 没有 他很相似
距离: 0.0333333333333 有没有也不管
距离: 0.0416666666667 General Game Playing
距离: 0.0416666666667 Angry Birds Free
距离: 0.0416666666667 没有 他不很相似
距离: 0.0416666666667 Angry Birds Stella
距离: 0.05 Analyse Numrique pour Ingnieurs
距离: 0.0583333333333 Writing II: Rhetorical Composing
距离: 0.0583333333333 A Brief History of Humankind
距离: 0.0666666666667 New Models of Business in Society
距离: 0.0666666666667 Evolution: A Course for Educators
距离: 0.0833333333333 Genetics and Society: A Course for Educators
距离: 0.0833333333333 The Dynamic Earth: A Course for Educators
距离: 0.1 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 0.108333333333 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 0.166666666667 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!
距离: 0.458333333333 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.

==========使用距离算法: minkowski ===========
距离: 0.0 没有
距离: 0.97639095729 也没有
距离: 0.97639095729 没有一个
距离: 1.02221419296 可以没有
距离: 1.11625739566 没有一个2
距离: 1.15412177776 没有 它很相似
距离: 1.15412177776 没有 他很相似
距离: 1.17892433254 没有\它很相似
距离: 1.19618985384 没有 他不很相似
距离: 1.41421356237 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.41421356237 Writing II: Rhetorical Composing
距离: 1.41421356237 General Game Playing
距离: 1.41421356237 Angry Birds Free
距离: 1.41421356237 Genetics and Society: A Course for Educators
距离: 1.41421356237 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.41421356237 A Brief History of Humankind
距离: 1.41421356237 New Models of Business in Society
距离: 1.41421356237 Analyse Numrique pour Ingnieurs
距离: 1.41421356237 Evolution: A Course for Educators
距离: 1.41421356237 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.41421356237 The Dynamic Earth: A Course for Educators
距离: 1.41421356237 有没有也不管
距离: 1.41421356237 Angry Birds Stella
距离: 1.41421356237 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: rogerstanimoto ===========
距离: 0.0 没有
距离: 0.0165289256198 可以没有
距离: 0.0165289256198 也没有
距离: 0.0165289256198 没有一个
距离: 0.0327868852459 没有一个2
距离: 0.0645161290323 没有\它很相似
距离: 0.0645161290323 没有 它很相似
距离: 0.0645161290323 没有 他很相似
距离: 0.0645161290323 有没有也不管
距离: 0.08 General Game Playing
距离: 0.08 Angry Birds Free
距离: 0.08 没有 他不很相似
距离: 0.08 Angry Birds Stella
距离: 0.0952380952381 Analyse Numrique pour Ingnieurs
距离: 0.110236220472 Writing II: Rhetorical Composing
距离: 0.110236220472 A Brief History of Humankind
距离: 0.125 New Models of Business in Society
距离: 0.125 Evolution: A Course for Educators
距离: 0.153846153846 Genetics and Society: A Course for Educators
距离: 0.153846153846 The Dynamic Earth: A Course for Educators
距离: 0.181818181818 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 0.195488721805 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 0.285714285714 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!
距离: 0.628571428571 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.

==========使用距离算法: russellrao ===========
距离: 0.991666666667 没有\它很相似
距离: 0.991666666667 没有 它很相似
距离: 0.991666666667 没有 他很相似
距离: 0.991666666667 没有 他不很相似
距离: 0.991666666667 没有
距离: 0.991666666667 可以没有
距离: 0.991666666667 也没有
距离: 0.991666666667 没有一个
距离: 0.991666666667 没有一个2
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: seuclidean ===========
距离: 0.0 没有
距离: 4.42260372687 没有一个
距离: 4.59785172192 也没有
距离: 5.28308093238 可以没有
距离: 6.1669750727 没有一个2
距离: 6.70948430848 没有 它很相似
距离: 6.77527966549 没有 他很相似
距离: 7.44257926548 没有\它很相似
距离: 7.76827444607 没有 他不很相似
距离: 7.91842198277 Angry Birds Stella
距离: 7.96476674302 Angry Birds Free
距离: 8.24123532701 有没有也不管
距离: 9.27535123042 General Game Playing
距离: 9.73776091877 Evolution: A Course for Educators
距离: 10.0200332662 Genetics and Society: A Course for Educators
距离: 10.4320152092 The Dynamic Earth: A Course for Educators
距离: 10.6065344221 A Brief History of Humankind
距离: 10.6337812588 Analyse Numrique pour Ingnieurs
距离: 11.0445980521 Writing II: Rhetorical Composing
距离: 11.7789774118 New Models of Business in Society
距离: 14.9845105962 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 15.9758201275 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 17.6592852567 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!
距离: 32.6379134454 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.

==========使用距离算法: sokalmichener ===========
距离: 0.0 没有
距离: 0.0165289256198 可以没有
距离: 0.0165289256198 也没有
距离: 0.0165289256198 没有一个
距离: 0.0327868852459 没有一个2
距离: 0.0645161290323 没有\它很相似
距离: 0.0645161290323 没有 它很相似
距离: 0.0645161290323 没有 他很相似
距离: 0.0645161290323 有没有也不管
距离: 0.08 General Game Playing
距离: 0.08 Angry Birds Free
距离: 0.08 没有 他不很相似
距离: 0.08 Angry Birds Stella
距离: 0.0952380952381 Analyse Numrique pour Ingnieurs
距离: 0.110236220472 Writing II: Rhetorical Composing
距离: 0.110236220472 A Brief History of Humankind
距离: 0.125 New Models of Business in Society
距离: 0.125 Evolution: A Course for Educators
距离: 0.153846153846 Genetics and Society: A Course for Educators
距离: 0.153846153846 The Dynamic Earth: A Course for Educators
距离: 0.181818181818 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 0.195488721805 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 0.285714285714 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!
距离: 0.628571428571 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.

==========使用距离算法: sokalsneath ===========
距离: 0.0 没有
距离: 0.666666666667 可以没有
距离: 0.666666666667 也没有
距离: 0.666666666667 没有一个
距离: 0.8 没有一个2
距离: 0.888888888889 没有\它很相似
距离: 0.888888888889 没有 它很相似
距离: 0.888888888889 没有 他很相似
距离: 0.909090909091 没有 他不很相似
距离: 1.0 Writing II: Rhetorical Composing
距离: 1.0 Genetics and Society: A Course for Educators
距离: 1.0 General Game Playing
距离: 1.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 1.0 A Brief History of Humankind
距离: 1.0 New Models of Business in Society
距离: 1.0 Analyse Numrique pour Ingnieurs
距离: 1.0 Evolution: A Course for Educators
距离: 1.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 1.0 The Dynamic Earth: A Course for Educators
距离: 1.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 1.0 Angry Birds Free
距离: 1.0 有没有也不管
距离: 1.0 Angry Birds Stella
距离: 1.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: sqeuclidean ===========
距离: 0.0 没有
距离: 0.953339301477 也没有
距离: 0.953339301477 没有一个
距离: 1.04492185629 可以没有
距离: 1.24603057338 没有一个2
距离: 1.3319970779 没有 它很相似
距离: 1.3319970779 没有 他很相似
距离: 1.38986258186 没有\它很相似
距离: 1.43087016643 没有 他不很相似
距离: 2.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 2.0 Writing II: Rhetorical Composing
距离: 2.0 General Game Playing
距离: 2.0 Angry Birds Free
距离: 2.0 Genetics and Society: A Course for Educators
距离: 2.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 2.0 A Brief History of Humankind
距离: 2.0 New Models of Business in Society
距离: 2.0 Analyse Numrique pour Ingnieurs
距离: 2.0 Evolution: A Course for Educators
距离: 2.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 2.0 The Dynamic Earth: A Course for Educators
距离: 2.0 有没有也不管
距离: 2.0 Angry Birds Stella
距离: 2.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

==========使用距离算法: yule ===========
距离: 0.0 没有\它很相似
距离: 0.0 没有 它很相似
距离: 0.0 没有 他很相似
距离: 0.0 没有 他不很相似
距离: 0.0 没有
距离: 0.0 可以没有
距离: 0.0 也没有
距离: 0.0 没有一个
距离: 0.0 没有一个2
距离: 2.0 Writing II: Rhetorical Composing
距离: 2.0 Genetics and Society: A Course for Educators
距离: 2.0 General Game Playing
距离: 2.0 Genes and the Human Condition (From Behavior to Biotechnology)
距离: 2.0 A Brief History of Humankind
距离: 2.0 New Models of Business in Society
距离: 2.0 Analyse Numrique pour Ingnieurs
距离: 2.0 Evolution: A Course for Educators
距离: 2.0 Coding the Matrix: Linear Algebra through Computer Science Applications
距离: 2.0 The Dynamic Earth: A Course for Educators
距离: 2.0 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
距离: 2.0 Angry Birds Free
距离: 2.0 有没有也不管
距离: 2.0 Angry Birds Stella
距离: 2.0 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

消耗时间: 0.0260000228882
(24, 120)
特征词表 [[u'\t' u' ' u'!' u'#' u'(' u')' u',' u'-' u'.' u'1' u'2' u':']
[u'' u'a' u'algebra' u'already' u'always' u'analyse' u'and' u'angry'
u'annoying' u'applications' u'are' u'as']
[u'at' u'back' u'beautiful' u'behavior' u'biotechnology' u'birds' u'brief'
u'brings' u'business' u'but' u'can' u'coding']
[u'composing' u'computer' u'condition' u'course' u'down' u'dreamed'
u'dynamic' u'earth' u'educators' u'evolution' u'fast' u'flap']
[u'flappy' u'fly' u'flying' u'for' u'free' u'freedom' u'from' u'full'
u'game' u'general' u'genes' u'genetics']
[u'gravity' u'have' u'hill' u'hills' u'history' u'hit' u'human'
u'humankind' u'ii' u'in' u'ingnieurs' u'into']
[u'is' u'jumps' u'least' u'linear' u'luckily' u'matrix' u'models'
u'moment' u'new' u'next' u'night' u'numrique']
[u'of' u'out' u'parody' u'playing' u'pour' u'rhetorical' u'science'
u'slide' u'smash' u'society' u'stella' u'the']
[u'this' u'through' u'tiny' u'to' u'until' u'use' u'waiting' u'watch'
u'wings' u'world' u'writing' u'you']
[u'your' u'\u4e00\u4e2a' u'\u4e0d' u'\u4e0d\u7ba1' u'\u4e5f' u'\u4ed6'
u'\u53ef\u4ee5' u'\u5b83' u'\u5f88' u'\u6709\u6ca1\u6709' u'\u6ca1\u6709'
u'\u76f8\u4f3c']]
向量空间模型 [[ 0 3 0 ..., 0 0 0]
[ 0 6 0 ..., 0 0 0]
[ 0 2 0 ..., 0 0 0]
...,
[ 1 12 2 ..., 0 0 0]
[ 0 0 0 ..., 0 1 0]
[ 0 0 0 ..., 0 1 0]]

测试: (1, 120)
test向量 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0]]

========使用皮尔逊相关系数 ======================
让步率: 1.0 0.0 没有
让步率: 0.704129476253 2.93408399473e-19 可以没有
让步率: 0.704129476253 2.93408399473e-19 也没有
让步率: 0.704129476253 2.93408399473e-19 没有一个
让步率: 0.572478027908 8.41827470681e-12 没有一个2
让步率: 0.439633154942 5.0667371983e-07 没有\它很相似
让步率: 0.439633154942 5.0667371983e-07 没有 它很相似
让步率: 0.439633154942 5.0667371983e-07 没有 他很相似
让步率: 0.399579611024 6.1499389981e-06 没有 他不很相似
让步率: -0.0146789237925 0.873570359825 有没有也不管
让步率: -0.0160552739119 0.861828497905 Angry Birds Stella
让步率: -0.0160552739119 0.861828497905 Angry Birds Free
让步率: -0.0160552739119 0.861828497905 General Game Playing
让步率: -0.0165079189204 0.857973359174 Analyse Numrique pour Ingnieurs
让步率: -0.0167056403959 0.856290425532 A Brief History of Humankind
让步率: -0.0168084737846 0.855415399395 New Models of Business in Society
让步率: -0.0178387178508 0.846658741618 Coding the Matrix: Linear Algebra through Computer Science Applications
让步率: -0.0178398481638 0.846649144469 Tiny Wings You have always dreamed of flying - but your wings are tiny. Luckily the world is full of beautiful hills. Use the hills as jumps - slide down, flap your wings and fly! At least for a moment - until this annoying gravity brings you back down to earth. But the next hill is waiting for you already. Watch out for the night and fly as fast as you can.
让步率: -0.0179991412043 0.845296860219 Genetics and Society: A Course for Educators
让步率: -0.0179991412043 0.845296860219 The Dynamic Earth: A Course for Educators
让步率: -0.0181890350986 0.84368538462 Evolution: A Course for Educators
让步率: -0.0182429819351 0.843227699035 Writing II: Rhetorical Composing
让步率: -0.0187390850906 0.839021223446 Genes and the Human Condition (From Behavior to Biotechnology)
让步率: -0.0207042398471 0.822404288436 Flappy Wings - FREE Fly into freedom!A parody of the #1 smash hit game!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment