Last active
August 29, 2015 14:01
-
-
Save mrocklin/d00a3ecd51965e442577 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:bfcac85dc85e057702a6ed3aa8bbb09fe82669662db1788df85405b059261b2d" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Jaccard Similarity with Blaze" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Jaccard similarity measures relative connectedness between pairs of nodes in a graph. It is the ratio of neighbors that the nodes have in common (mutual friends) divided by the number of neighbors that the two have in total (friends of either node).\n", | |
"\n", | |
"$$ J(i,j) = \\frac{|n(i) \\cap n(j)|}{|n(i) \\cup n(j)| } \\;\\;\n", | |
"n(i) = \\textrm{Neighbor set of node}_i$$" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We compute directed Jaccard similarity on a small portion of the web graph." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If you don't have the data\n", | |
"\n", | |
" wget http://webdatacommons.org/hyperlinkgraph/data/example_index\n", | |
" wget http://webdatacommons.org/hyperlinkgraph/data/example_arcs" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from blaze.data.csv import CSV\n", | |
"\n", | |
"# An index mapping website names to numeric identifiers\n", | |
"\n", | |
"index_dd = CSV('example_index', columns=['name', 'id'])\n", | |
"list(index_dd.py[:5]) # The first five index entries" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 1, | |
"text": [ | |
"[(u'1000notes.com', 0L),\n", | |
" (u'100500.tv', 1L),\n", | |
" (u'abebooks.com', 2L),\n", | |
" (u'abebooks.de', 3L),\n", | |
" (u'amazon-presse.de', 4L)]" | |
] | |
} | |
], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# Hyperlinks between webpages\n", | |
"\n", | |
"arcs_dd = CSV('example_arcs', columns=['source', 'destination'])\n", | |
"list(arcs_dd.py[:10]) # The first ten edges" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 2, | |
"text": [ | |
"[(7L, 5L),\n", | |
" (7L, 6L),\n", | |
" (7L, 8L),\n", | |
" (7L, 9L),\n", | |
" (7L, 10L),\n", | |
" (7L, 12L),\n", | |
" (7L, 26L),\n", | |
" (7L, 57L),\n", | |
" (7L, 70L),\n", | |
" (7L, 82L)]" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Symbolics\n", | |
"\n", | |
"In this section we define Jaccard similarity symbolically. In the next section we apply our symbolic results to data." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First we create two abstract tables around our data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from blaze.expr.table import *\n", | |
"\n", | |
"index = TableSymbol('index', index_dd.schema) # mapping urls to IDs\n", | |
"arcs = TableSymbol('arcs', arcs_dd.schema) # Connections between website IDs" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We compute the out-degree of each node, a simple split-apply-combine computation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"out_degree = By(arcs,\n", | |
" arcs['source'],\n", | |
" arcs['destination'].count()).relabel({'destination': 'degree'})" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Split-apply-combine can be compounded with `Join` to form relatively complex computations. Here we compute the number of out-neighbors shared by any two nodes." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"joined = Join(arcs.relabel({'source': 'a'}),\n", | |
" arcs.relabel({'source': 'b'}),\n", | |
" 'destination')\n", | |
"\n", | |
"shared = By(joined, \n", | |
" joined[['a', 'b']], \n", | |
" joined['destination'].count()).relabel({'destination': 'shared'})\n", | |
"\n", | |
"shared = shared[shared['a'] < shared['b']] # avoid double entries" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Computing Jaccard similarity from that is two more efficient Joins, followed by some basic arithmetic." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"degrees = Join(out_degree, shared, 'source', 'a').relabel({'degree': 'a_degree', 'source': 'a'})\n", | |
"degrees = Join(out_degree, degrees, 'source', 'b').relabel({'degree': 'b_degree', 'source': 'b'})\n", | |
"degrees = degrees[['a', 'b', 'a_degree', 'b_degree', 'shared']]\n", | |
"\n", | |
"jaccard = (degrees['shared'] / (degrees['a_degree'] + degrees['b_degree'] - degrees['shared'])).label('jaccard')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We massage the result, swap in website URLs for numeric identifiers, clean up self-connections, and sort to obtain our final computation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"result = collect(degrees[['a', 'b']], jaccard)\n", | |
"result = result[result['a'] != result['b']]\n", | |
"result = Join(index, result, 'id', 'a').relabel({'name': 'source_name'})\n", | |
"result = Join(index, result, 'id', 'b').relabel({'name': 'destination_name'})\n", | |
"result = result[['source_name', 'destination_name', 'jaccard']]\n", | |
"\n", | |
"result = result.sort('jaccard', ascending=False)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"result.schema" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 8, | |
"text": [ | |
"dshape(\"{ source_name : string, destination_name : string, jaccard : float64 }\")" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This symbolic computation was checked as we constructed it, providing rapid feedback to interactive users without the cost of waiting for expensive computations to fail." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Computation\n", | |
"\n", | |
"We execute our symbolic computation against normal Python data structures" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from blaze.compute.python import compute\n", | |
"\n", | |
"arcs_data = list(arcs_dd)\n", | |
"index_data = list(index_dd)\n", | |
"\n", | |
"list(compute(result, {arcs: arcs_data, index: index_data}))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 9, | |
"text": [ | |
"[(u'google.com', u'zoomblog.com', 1.0),\n", | |
" (u'memeorandum.com', u'sapo.pt', 1.0),\n", | |
" (u'kenyaunlimited.com', u'zoomblog.com', 1.0),\n", | |
" (u'google.com', u'memeorandum.com', 1.0),\n", | |
" (u'pjmedia.com', u'zoomblog.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'pjmedia.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'google.com', 1.0),\n", | |
" (u'mesvilaweb.cat', u'pjmedia.com', 1.0),\n", | |
" (u'classicalvalues.com', u'pjmedia.com', 1.0),\n", | |
" (u'kenyaunlimited.com', u'sapo.pt', 1.0),\n", | |
" (u'sapo.pt', u'zoomblog.com', 1.0),\n", | |
" (u'mesvilaweb.cat', u'zoomblog.com', 1.0),\n", | |
" (u'classicalvalues.com', u'eltangoysusinvitados.com', 1.0),\n", | |
" (u'blogia.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'blogia.com', u'classicalvalues.com', 1.0),\n", | |
" (u'kenyaunlimited.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'classicalvalues.com', u'zoomblog.com', 1.0),\n", | |
" (u'mu.nu', u'pjmedia.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'zoomblog.com', 1.0),\n", | |
" (u'blogia.com', u'zoomblog.com', 1.0),\n", | |
" (u'kenyaunlimited.com', u'pjmedia.com', 1.0),\n", | |
" (u'memeorandum.com', u'pjmedia.com', 1.0),\n", | |
" (u'google.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'blogia.com', u'memeorandum.com', 1.0),\n", | |
" (u'mu.nu', u'zoomblog.com', 1.0),\n", | |
" (u'blogia.com', u'eltangoysusinvitados.com', 1.0),\n", | |
" (u'google.com', u'sapo.pt', 1.0),\n", | |
" (u'mesvilaweb.cat', u'sapo.pt', 1.0),\n", | |
" (u'kenyaunlimited.com', u'mu.nu', 1.0),\n", | |
" (u'google.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'classicalvalues.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'kenyaunlimited.com', u'memeorandum.com', 1.0),\n", | |
" (u'google.com', u'mu.nu', 1.0),\n", | |
" (u'blogia.com', u'sapo.pt', 1.0),\n", | |
" (u'blogia.com', u'pjmedia.com', 1.0),\n", | |
" (u'classicalvalues.com', u'sapo.pt', 1.0),\n", | |
" (u'classicalvalues.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'blogia.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'memeorandum.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'mesvilaweb.cat', u'mu.nu', 1.0),\n", | |
" (u'pjmedia.com', u'sapo.pt', 1.0),\n", | |
" (u'qwika.com', u'wikidict.de', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'sapo.pt', 1.0),\n", | |
" (u'google.com', u'pjmedia.com', 1.0),\n", | |
" (u'classicalvalues.com', u'mu.nu', 1.0),\n", | |
" (u'mu.nu', u'sapo.pt', 1.0),\n", | |
" (u'blogia.com', u'google.com', 1.0),\n", | |
" (u'classicalvalues.com', u'google.com', 1.0),\n", | |
" (u'memeorandum.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'mu.nu', 1.0),\n", | |
" (u'blogia.com', u'mu.nu', 1.0),\n", | |
" (u'memeorandum.com', u'zoomblog.com', 1.0),\n", | |
" (u'memeorandum.com', u'mu.nu', 1.0),\n", | |
" (u'classicalvalues.com', u'memeorandum.com', 1.0),\n", | |
" (u'blogalaxia.com', u'sapo.pt', 0.5),\n", | |
" (u'blogalaxia.com', u'mu.nu', 0.5),\n", | |
" (u'blogalaxia.com', u'mesvilaweb.cat', 0.5),\n", | |
" (u'blogalaxia.com', u'zoomblog.com', 0.5),\n", | |
" (u'blogalaxia.com', u'kenyaunlimited.com', 0.5),\n", | |
" (u'wikipedia.org', u'wiktionary.org', 0.5),\n", | |
" (u'blogalaxia.com', u'pjmedia.com', 0.5),\n", | |
" (u'blogalaxia.com', u'eltangoysusinvitados.com', 0.5),\n", | |
" (u'blogalaxia.com', u'classicalvalues.com', 0.5),\n", | |
" (u'blogalaxia.com', u'memeorandum.com', 0.5),\n", | |
" (u'blogalaxia.com', u'blogia.com', 0.5),\n", | |
" (u'blogalaxia.com', u'google.com', 0.5),\n", | |
" (u'amazon.co.jp', u'amazon.de', 0.4444444444444444),\n", | |
" (u'mu.nu', u'typepad.com', 0.3333333333333333),\n", | |
" (u'google.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'classicalvalues.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'pjmedia.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'eltangoysusinvitados.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'sapo.pt', u'typepad.com', 0.3333333333333333),\n", | |
" (u'mesvilaweb.cat', u'typepad.com', 0.3333333333333333),\n", | |
" (u'typepad.com', u'zoomblog.com', 0.3333333333333333),\n", | |
" (u'kenyaunlimited.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'memeorandum.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'blogia.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'blogalaxia.com', u'typepad.com', 0.25),\n", | |
" (u'amazon.co.jp', u'amazon.com', 0.25),\n", | |
" (u'amazon.com', u'amazon.de', 0.20588235294117646),\n", | |
" (u'blogspot.com', u'wordpress.com', 0.17857142857142858),\n", | |
" (u'sapo.pt', u'wordpress.com', 0.1),\n", | |
" (u'google.com', u'wordpress.com', 0.1),\n", | |
" (u'wordpress.com', u'zoomblog.com', 0.1),\n", | |
" (u'mesvilaweb.cat', u'wordpress.com', 0.1),\n", | |
" (u'eltangoysusinvitados.com', u'wordpress.com', 0.1),\n", | |
" (u'memeorandum.com', u'wordpress.com', 0.1),\n", | |
" (u'mu.nu', u'wordpress.com', 0.1),\n", | |
" (u'kenyaunlimited.com', u'wordpress.com', 0.1),\n", | |
" (u'animationplayhouse.com', u'wordpress.com', 0.1),\n", | |
" (u'classicalvalues.com', u'wordpress.com', 0.1),\n", | |
" (u'wikidict.de', u'wordpress.com', 0.1),\n", | |
" (u'blogia.com', u'wordpress.com', 0.1),\n", | |
" (u'pjmedia.com', u'wordpress.com', 0.1),\n", | |
" (u'qwika.com', u'wordpress.com', 0.1),\n", | |
" (u'blogalaxia.com', u'wordpress.com', 0.09090909090909091),\n", | |
" (u'blogspot.com', u'youtube.com', 0.08333333333333333),\n", | |
" (u'typepad.com', u'wordpress.com', 0.08333333333333333),\n", | |
" (u'tumblr.com', u'wordpress.com', 0.0625),\n", | |
" (u'azspot.net', u'youtube.com', 0.0625),\n", | |
" (u'downthisvideo.com', u'youtube.com', 0.0625),\n", | |
" (u'flickr.com', u'youtube.com', 0.05555555555555555),\n", | |
" (u'blogspot.com', u'wikidict.de', 0.043478260869565216),\n", | |
" (u'azspot.net', u'blogspot.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'qwika.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'over-blog.com', 0.043478260869565216),\n", | |
" (u'animationplayhouse.com', u'blogspot.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'downthisvideo.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'wiktionary.org', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'wikipedia.org', 0.041666666666666664),\n", | |
" (u'blogspot.com', u'flickr.com', 0.04),\n", | |
" (u'blogspot.com', u'typepad.com', 0.04),\n", | |
" (u'amazon.com', u'img-dpreview.com', 0.04),\n", | |
" (u'blogspot.com', u'tumblr.com', 0.034482758620689655),\n", | |
" (u'amazon.com', u'blogspot.com', 0.02127659574468085)]" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This is a small canned dataset, hence the large number of perfect similarities.\n", | |
"\n", | |
"Fortunately our symbolic computation can work both with different datasets and with different backends." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Spark Interaction\n", | |
"\n", | |
"To demonstrate cross-platform computation we execute the same Jaccard computation using Spark on Spark Resilient Distributed Datasets." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import pyspark\n", | |
"sc = pyspark.SparkContext(\"local\", \"Jaccard-demo\") # Just a local Spark context\n", | |
"\n", | |
"arcs_rdd = sc.parallelize(arcs_data) # Distribute our lists over Spark \n", | |
"index_rdd = sc.parallelize(index_data)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"38381\n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from blaze.compute.spark import compute\n", | |
"\n", | |
"rdd_out = compute(result, {arcs: arcs_rdd, index: index_rdd})\n", | |
"rdd_out # RDDs in, RDDs out." | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 11, | |
"text": [ | |
"PythonRDD[53] at RDD at PythonRDD.scala:37" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"rdd_out.collect()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
"[(u'blogia.com', u'classicalvalues.com', 1.0),\n", | |
" (u'blogia.com', u'eltangoysusinvitados.com', 1.0),\n", | |
" (u'classicalvalues.com', u'eltangoysusinvitados.com', 1.0),\n", | |
" (u'blogia.com', u'google.com', 1.0),\n", | |
" (u'classicalvalues.com', u'google.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'google.com', 1.0),\n", | |
" (u'blogia.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'classicalvalues.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'google.com', u'kenyaunlimited.com', 1.0),\n", | |
" (u'blogia.com', u'memeorandum.com', 1.0),\n", | |
" (u'classicalvalues.com', u'memeorandum.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'memeorandum.com', 1.0),\n", | |
" (u'google.com', u'memeorandum.com', 1.0),\n", | |
" (u'kenyaunlimited.com', u'memeorandum.com', 1.0),\n", | |
" (u'blogia.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'classicalvalues.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'google.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'kenyaunlimited.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'memeorandum.com', u'mesvilaweb.cat', 1.0),\n", | |
" (u'blogia.com', u'mu.nu', 1.0),\n", | |
" (u'classicalvalues.com', u'mu.nu', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'mu.nu', 1.0),\n", | |
" (u'google.com', u'mu.nu', 1.0),\n", | |
" (u'kenyaunlimited.com', u'mu.nu', 1.0),\n", | |
" (u'memeorandum.com', u'mu.nu', 1.0),\n", | |
" (u'mesvilaweb.cat', u'mu.nu', 1.0),\n", | |
" (u'blogia.com', u'pjmedia.com', 1.0),\n", | |
" (u'classicalvalues.com', u'pjmedia.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'pjmedia.com', 1.0),\n", | |
" (u'google.com', u'pjmedia.com', 1.0),\n", | |
" (u'kenyaunlimited.com', u'pjmedia.com', 1.0),\n", | |
" (u'memeorandum.com', u'pjmedia.com', 1.0),\n", | |
" (u'mesvilaweb.cat', u'pjmedia.com', 1.0),\n", | |
" (u'mu.nu', u'pjmedia.com', 1.0),\n", | |
" (u'blogia.com', u'sapo.pt', 1.0),\n", | |
" (u'classicalvalues.com', u'sapo.pt', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'sapo.pt', 1.0),\n", | |
" (u'google.com', u'sapo.pt', 1.0),\n", | |
" (u'kenyaunlimited.com', u'sapo.pt', 1.0),\n", | |
" (u'memeorandum.com', u'sapo.pt', 1.0),\n", | |
" (u'mesvilaweb.cat', u'sapo.pt', 1.0),\n", | |
" (u'mu.nu', u'sapo.pt', 1.0),\n", | |
" (u'pjmedia.com', u'sapo.pt', 1.0),\n", | |
" (u'qwika.com', u'wikidict.de', 1.0),\n", | |
" (u'blogia.com', u'zoomblog.com', 1.0),\n", | |
" (u'classicalvalues.com', u'zoomblog.com', 1.0),\n", | |
" (u'eltangoysusinvitados.com', u'zoomblog.com', 1.0),\n", | |
" (u'google.com', u'zoomblog.com', 1.0),\n", | |
" (u'kenyaunlimited.com', u'zoomblog.com', 1.0),\n", | |
" (u'memeorandum.com', u'zoomblog.com', 1.0),\n", | |
" (u'mesvilaweb.cat', u'zoomblog.com', 1.0),\n", | |
" (u'mu.nu', u'zoomblog.com', 1.0),\n", | |
" (u'pjmedia.com', u'zoomblog.com', 1.0),\n", | |
" (u'sapo.pt', u'zoomblog.com', 1.0),\n", | |
" (u'blogalaxia.com', u'blogia.com', 0.5),\n", | |
" (u'blogalaxia.com', u'classicalvalues.com', 0.5),\n", | |
" (u'blogalaxia.com', u'eltangoysusinvitados.com', 0.5),\n", | |
" (u'blogalaxia.com', u'google.com', 0.5),\n", | |
" (u'blogalaxia.com', u'kenyaunlimited.com', 0.5),\n", | |
" (u'blogalaxia.com', u'memeorandum.com', 0.5),\n", | |
" (u'blogalaxia.com', u'mesvilaweb.cat', 0.5),\n", | |
" (u'blogalaxia.com', u'mu.nu', 0.5),\n", | |
" (u'blogalaxia.com', u'pjmedia.com', 0.5),\n", | |
" (u'blogalaxia.com', u'sapo.pt', 0.5),\n", | |
" (u'wikipedia.org', u'wiktionary.org', 0.5),\n", | |
" (u'blogalaxia.com', u'zoomblog.com', 0.5),\n", | |
" (u'amazon.co.jp', u'amazon.de', 0.4444444444444444),\n", | |
" (u'blogia.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'classicalvalues.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'eltangoysusinvitados.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'google.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'kenyaunlimited.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'memeorandum.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'mesvilaweb.cat', u'typepad.com', 0.3333333333333333),\n", | |
" (u'mu.nu', u'typepad.com', 0.3333333333333333),\n", | |
" (u'pjmedia.com', u'typepad.com', 0.3333333333333333),\n", | |
" (u'sapo.pt', u'typepad.com', 0.3333333333333333),\n", | |
" (u'typepad.com', u'zoomblog.com', 0.3333333333333333),\n", | |
" (u'amazon.co.jp', u'amazon.com', 0.25),\n", | |
" (u'blogalaxia.com', u'typepad.com', 0.25),\n", | |
" (u'amazon.com', u'amazon.de', 0.20588235294117646),\n", | |
" (u'blogspot.com', u'wordpress.com', 0.17857142857142858),\n", | |
" (u'animationplayhouse.com', u'wordpress.com', 0.1),\n", | |
" (u'blogia.com', u'wordpress.com', 0.1),\n", | |
" (u'classicalvalues.com', u'wordpress.com', 0.1),\n", | |
" (u'eltangoysusinvitados.com', u'wordpress.com', 0.1),\n", | |
" (u'google.com', u'wordpress.com', 0.1),\n", | |
" (u'kenyaunlimited.com', u'wordpress.com', 0.1),\n", | |
" (u'memeorandum.com', u'wordpress.com', 0.1),\n", | |
" (u'mesvilaweb.cat', u'wordpress.com', 0.1),\n", | |
" (u'mu.nu', u'wordpress.com', 0.1),\n", | |
" (u'pjmedia.com', u'wordpress.com', 0.1),\n", | |
" (u'qwika.com', u'wordpress.com', 0.1),\n", | |
" (u'sapo.pt', u'wordpress.com', 0.1),\n", | |
" (u'wikidict.de', u'wordpress.com', 0.1),\n", | |
" (u'wordpress.com', u'zoomblog.com', 0.1),\n", | |
" (u'blogalaxia.com', u'wordpress.com', 0.09090909090909091),\n", | |
" (u'typepad.com', u'wordpress.com', 0.08333333333333333),\n", | |
" (u'blogspot.com', u'youtube.com', 0.08333333333333333),\n", | |
" (u'tumblr.com', u'wordpress.com', 0.0625),\n", | |
" (u'azspot.net', u'youtube.com', 0.0625),\n", | |
" (u'downthisvideo.com', u'youtube.com', 0.0625),\n", | |
" (u'flickr.com', u'youtube.com', 0.05555555555555555),\n", | |
" (u'animationplayhouse.com', u'blogspot.com', 0.043478260869565216),\n", | |
" (u'azspot.net', u'blogspot.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'downthisvideo.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'over-blog.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'qwika.com', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'wikidict.de', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'wiktionary.org', 0.043478260869565216),\n", | |
" (u'blogspot.com', u'wikipedia.org', 0.041666666666666664),\n", | |
" (u'blogspot.com', u'flickr.com', 0.04),\n", | |
" (u'amazon.com', u'img-dpreview.com', 0.04),\n", | |
" (u'blogspot.com', u'typepad.com', 0.04),\n", | |
" (u'blogspot.com', u'tumblr.com', 0.034482758620689655),\n", | |
" (u'amazon.com', u'blogspot.com', 0.02127659574468085)]" | |
] | |
} | |
], | |
"prompt_number": 12 | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment