Skip to content

Instantly share code, notes, and snippets.

@mrocklin
Last active August 29, 2015 14:01
Show Gist options
  • Save mrocklin/d00a3ecd51965e442577 to your computer and use it in GitHub Desktop.
Save mrocklin/d00a3ecd51965e442577 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:bfcac85dc85e057702a6ed3aa8bbb09fe82669662db1788df85405b059261b2d"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Jaccard Similarity with Blaze"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Jaccard similarity measures relative connectedness between pairs of nodes in a graph. It is the ratio of neighbors that the nodes have in common (mutual friends) divided by the number of neighbors that the two have in total (friends of either node).\n",
"\n",
"$$ J(i,j) = \\frac{|n(i) \\cap n(j)|}{|n(i) \\cup n(j)| } \\;\\;\n",
"n(i) = \\textrm{Neighbor set of node}_i$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We compute directed Jaccard similarity on a small portion of the web graph."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you don't have the data\n",
"\n",
" wget http://webdatacommons.org/hyperlinkgraph/data/example_index\n",
" wget http://webdatacommons.org/hyperlinkgraph/data/example_arcs"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze.data.csv import CSV\n",
"\n",
"# An index mapping website names to numeric identifiers\n",
"\n",
"index_dd = CSV('example_index', columns=['name', 'id'])\n",
"list(index_dd.py[:5]) # The first five index entries"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
"[(u'1000notes.com', 0L),\n",
" (u'100500.tv', 1L),\n",
" (u'abebooks.com', 2L),\n",
" (u'abebooks.de', 3L),\n",
" (u'amazon-presse.de', 4L)]"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Hyperlinks between webpages\n",
"\n",
"arcs_dd = CSV('example_arcs', columns=['source', 'destination'])\n",
"list(arcs_dd.py[:10]) # The first ten edges"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"[(7L, 5L),\n",
" (7L, 6L),\n",
" (7L, 8L),\n",
" (7L, 9L),\n",
" (7L, 10L),\n",
" (7L, 12L),\n",
" (7L, 26L),\n",
" (7L, 57L),\n",
" (7L, 70L),\n",
" (7L, 82L)]"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Symbolics\n",
"\n",
"In this section we define Jaccard similarity symbolically. In the next section we apply our symbolic results to data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we create two abstract tables around our data"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze.expr.table import *\n",
"\n",
"index = TableSymbol('index', index_dd.schema) # mapping urls to IDs\n",
"arcs = TableSymbol('arcs', arcs_dd.schema) # Connections between website IDs"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We compute the out-degree of each node, a simple split-apply-combine computation."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"out_degree = By(arcs,\n",
" arcs['source'],\n",
" arcs['destination'].count()).relabel({'destination': 'degree'})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Split-apply-combine can be compounded with `Join` to form relatively complex computations. Here we compute the number of out-neighbors shared by any two nodes."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"joined = Join(arcs.relabel({'source': 'a'}),\n",
" arcs.relabel({'source': 'b'}),\n",
" 'destination')\n",
"\n",
"shared = By(joined, \n",
" joined[['a', 'b']], \n",
" joined['destination'].count()).relabel({'destination': 'shared'})\n",
"\n",
"shared = shared[shared['a'] < shared['b']] # avoid double entries"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Computing Jaccard similarity from that is two more efficient Joins, followed by some basic arithmetic."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"degrees = Join(out_degree, shared, 'source', 'a').relabel({'degree': 'a_degree', 'source': 'a'})\n",
"degrees = Join(out_degree, degrees, 'source', 'b').relabel({'degree': 'b_degree', 'source': 'b'})\n",
"degrees = degrees[['a', 'b', 'a_degree', 'b_degree', 'shared']]\n",
"\n",
"jaccard = (degrees['shared'] / (degrees['a_degree'] + degrees['b_degree'] - degrees['shared'])).label('jaccard')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We massage the result, swap in website URLs for numeric identifiers, clean up self-connections, and sort to obtain our final computation"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"result = collect(degrees[['a', 'b']], jaccard)\n",
"result = result[result['a'] != result['b']]\n",
"result = Join(index, result, 'id', 'a').relabel({'name': 'source_name'})\n",
"result = Join(index, result, 'id', 'b').relabel({'name': 'destination_name'})\n",
"result = result[['source_name', 'destination_name', 'jaccard']]\n",
"\n",
"result = result.sort('jaccard', ascending=False)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"result.schema"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"dshape(\"{ source_name : string, destination_name : string, jaccard : float64 }\")"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This symbolic computation was checked as we constructed it, providing rapid feedback to interactive users without the cost of waiting for expensive computations to fail."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computation\n",
"\n",
"We execute our symbolic computation against normal Python data structures"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze.compute.python import compute\n",
"\n",
"arcs_data = list(arcs_dd)\n",
"index_data = list(index_dd)\n",
"\n",
"list(compute(result, {arcs: arcs_data, index: index_data}))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"[(u'google.com', u'zoomblog.com', 1.0),\n",
" (u'memeorandum.com', u'sapo.pt', 1.0),\n",
" (u'kenyaunlimited.com', u'zoomblog.com', 1.0),\n",
" (u'google.com', u'memeorandum.com', 1.0),\n",
" (u'pjmedia.com', u'zoomblog.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'pjmedia.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'google.com', 1.0),\n",
" (u'mesvilaweb.cat', u'pjmedia.com', 1.0),\n",
" (u'classicalvalues.com', u'pjmedia.com', 1.0),\n",
" (u'kenyaunlimited.com', u'sapo.pt', 1.0),\n",
" (u'sapo.pt', u'zoomblog.com', 1.0),\n",
" (u'mesvilaweb.cat', u'zoomblog.com', 1.0),\n",
" (u'classicalvalues.com', u'eltangoysusinvitados.com', 1.0),\n",
" (u'blogia.com', u'kenyaunlimited.com', 1.0),\n",
" (u'blogia.com', u'classicalvalues.com', 1.0),\n",
" (u'kenyaunlimited.com', u'mesvilaweb.cat', 1.0),\n",
" (u'classicalvalues.com', u'zoomblog.com', 1.0),\n",
" (u'mu.nu', u'pjmedia.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'zoomblog.com', 1.0),\n",
" (u'blogia.com', u'zoomblog.com', 1.0),\n",
" (u'kenyaunlimited.com', u'pjmedia.com', 1.0),\n",
" (u'memeorandum.com', u'pjmedia.com', 1.0),\n",
" (u'google.com', u'mesvilaweb.cat', 1.0),\n",
" (u'blogia.com', u'memeorandum.com', 1.0),\n",
" (u'mu.nu', u'zoomblog.com', 1.0),\n",
" (u'blogia.com', u'eltangoysusinvitados.com', 1.0),\n",
" (u'google.com', u'sapo.pt', 1.0),\n",
" (u'mesvilaweb.cat', u'sapo.pt', 1.0),\n",
" (u'kenyaunlimited.com', u'mu.nu', 1.0),\n",
" (u'google.com', u'kenyaunlimited.com', 1.0),\n",
" (u'classicalvalues.com', u'kenyaunlimited.com', 1.0),\n",
" (u'kenyaunlimited.com', u'memeorandum.com', 1.0),\n",
" (u'google.com', u'mu.nu', 1.0),\n",
" (u'blogia.com', u'sapo.pt', 1.0),\n",
" (u'blogia.com', u'pjmedia.com', 1.0),\n",
" (u'classicalvalues.com', u'sapo.pt', 1.0),\n",
" (u'classicalvalues.com', u'mesvilaweb.cat', 1.0),\n",
" (u'blogia.com', u'mesvilaweb.cat', 1.0),\n",
" (u'eltangoysusinvitados.com', u'kenyaunlimited.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'memeorandum.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'mesvilaweb.cat', 1.0),\n",
" (u'mesvilaweb.cat', u'mu.nu', 1.0),\n",
" (u'pjmedia.com', u'sapo.pt', 1.0),\n",
" (u'qwika.com', u'wikidict.de', 1.0),\n",
" (u'eltangoysusinvitados.com', u'sapo.pt', 1.0),\n",
" (u'google.com', u'pjmedia.com', 1.0),\n",
" (u'classicalvalues.com', u'mu.nu', 1.0),\n",
" (u'mu.nu', u'sapo.pt', 1.0),\n",
" (u'blogia.com', u'google.com', 1.0),\n",
" (u'classicalvalues.com', u'google.com', 1.0),\n",
" (u'memeorandum.com', u'mesvilaweb.cat', 1.0),\n",
" (u'eltangoysusinvitados.com', u'mu.nu', 1.0),\n",
" (u'blogia.com', u'mu.nu', 1.0),\n",
" (u'memeorandum.com', u'zoomblog.com', 1.0),\n",
" (u'memeorandum.com', u'mu.nu', 1.0),\n",
" (u'classicalvalues.com', u'memeorandum.com', 1.0),\n",
" (u'blogalaxia.com', u'sapo.pt', 0.5),\n",
" (u'blogalaxia.com', u'mu.nu', 0.5),\n",
" (u'blogalaxia.com', u'mesvilaweb.cat', 0.5),\n",
" (u'blogalaxia.com', u'zoomblog.com', 0.5),\n",
" (u'blogalaxia.com', u'kenyaunlimited.com', 0.5),\n",
" (u'wikipedia.org', u'wiktionary.org', 0.5),\n",
" (u'blogalaxia.com', u'pjmedia.com', 0.5),\n",
" (u'blogalaxia.com', u'eltangoysusinvitados.com', 0.5),\n",
" (u'blogalaxia.com', u'classicalvalues.com', 0.5),\n",
" (u'blogalaxia.com', u'memeorandum.com', 0.5),\n",
" (u'blogalaxia.com', u'blogia.com', 0.5),\n",
" (u'blogalaxia.com', u'google.com', 0.5),\n",
" (u'amazon.co.jp', u'amazon.de', 0.4444444444444444),\n",
" (u'mu.nu', u'typepad.com', 0.3333333333333333),\n",
" (u'google.com', u'typepad.com', 0.3333333333333333),\n",
" (u'classicalvalues.com', u'typepad.com', 0.3333333333333333),\n",
" (u'pjmedia.com', u'typepad.com', 0.3333333333333333),\n",
" (u'eltangoysusinvitados.com', u'typepad.com', 0.3333333333333333),\n",
" (u'sapo.pt', u'typepad.com', 0.3333333333333333),\n",
" (u'mesvilaweb.cat', u'typepad.com', 0.3333333333333333),\n",
" (u'typepad.com', u'zoomblog.com', 0.3333333333333333),\n",
" (u'kenyaunlimited.com', u'typepad.com', 0.3333333333333333),\n",
" (u'memeorandum.com', u'typepad.com', 0.3333333333333333),\n",
" (u'blogia.com', u'typepad.com', 0.3333333333333333),\n",
" (u'blogalaxia.com', u'typepad.com', 0.25),\n",
" (u'amazon.co.jp', u'amazon.com', 0.25),\n",
" (u'amazon.com', u'amazon.de', 0.20588235294117646),\n",
" (u'blogspot.com', u'wordpress.com', 0.17857142857142858),\n",
" (u'sapo.pt', u'wordpress.com', 0.1),\n",
" (u'google.com', u'wordpress.com', 0.1),\n",
" (u'wordpress.com', u'zoomblog.com', 0.1),\n",
" (u'mesvilaweb.cat', u'wordpress.com', 0.1),\n",
" (u'eltangoysusinvitados.com', u'wordpress.com', 0.1),\n",
" (u'memeorandum.com', u'wordpress.com', 0.1),\n",
" (u'mu.nu', u'wordpress.com', 0.1),\n",
" (u'kenyaunlimited.com', u'wordpress.com', 0.1),\n",
" (u'animationplayhouse.com', u'wordpress.com', 0.1),\n",
" (u'classicalvalues.com', u'wordpress.com', 0.1),\n",
" (u'wikidict.de', u'wordpress.com', 0.1),\n",
" (u'blogia.com', u'wordpress.com', 0.1),\n",
" (u'pjmedia.com', u'wordpress.com', 0.1),\n",
" (u'qwika.com', u'wordpress.com', 0.1),\n",
" (u'blogalaxia.com', u'wordpress.com', 0.09090909090909091),\n",
" (u'blogspot.com', u'youtube.com', 0.08333333333333333),\n",
" (u'typepad.com', u'wordpress.com', 0.08333333333333333),\n",
" (u'tumblr.com', u'wordpress.com', 0.0625),\n",
" (u'azspot.net', u'youtube.com', 0.0625),\n",
" (u'downthisvideo.com', u'youtube.com', 0.0625),\n",
" (u'flickr.com', u'youtube.com', 0.05555555555555555),\n",
" (u'blogspot.com', u'wikidict.de', 0.043478260869565216),\n",
" (u'azspot.net', u'blogspot.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'qwika.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'over-blog.com', 0.043478260869565216),\n",
" (u'animationplayhouse.com', u'blogspot.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'downthisvideo.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'wiktionary.org', 0.043478260869565216),\n",
" (u'blogspot.com', u'wikipedia.org', 0.041666666666666664),\n",
" (u'blogspot.com', u'flickr.com', 0.04),\n",
" (u'blogspot.com', u'typepad.com', 0.04),\n",
" (u'amazon.com', u'img-dpreview.com', 0.04),\n",
" (u'blogspot.com', u'tumblr.com', 0.034482758620689655),\n",
" (u'amazon.com', u'blogspot.com', 0.02127659574468085)]"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a small canned dataset, hence the large number of perfect similarities.\n",
"\n",
"Fortunately our symbolic computation can work both with different datasets and with different backends."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Spark Interaction\n",
"\n",
"To demonstrate cross-platform computation we execute the same Jaccard computation using Spark on Spark Resilient Distributed Datasets."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pyspark\n",
"sc = pyspark.SparkContext(\"local\", \"Jaccard-demo\") # Just a local Spark context\n",
"\n",
"arcs_rdd = sc.parallelize(arcs_data) # Distribute our lists over Spark \n",
"index_rdd = sc.parallelize(index_data)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"38381\n",
"\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze.compute.spark import compute\n",
"\n",
"rdd_out = compute(result, {arcs: arcs_rdd, index: index_rdd})\n",
"rdd_out # RDDs in, RDDs out."
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"PythonRDD[53] at RDD at PythonRDD.scala:37"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"rdd_out.collect()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"[(u'blogia.com', u'classicalvalues.com', 1.0),\n",
" (u'blogia.com', u'eltangoysusinvitados.com', 1.0),\n",
" (u'classicalvalues.com', u'eltangoysusinvitados.com', 1.0),\n",
" (u'blogia.com', u'google.com', 1.0),\n",
" (u'classicalvalues.com', u'google.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'google.com', 1.0),\n",
" (u'blogia.com', u'kenyaunlimited.com', 1.0),\n",
" (u'classicalvalues.com', u'kenyaunlimited.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'kenyaunlimited.com', 1.0),\n",
" (u'google.com', u'kenyaunlimited.com', 1.0),\n",
" (u'blogia.com', u'memeorandum.com', 1.0),\n",
" (u'classicalvalues.com', u'memeorandum.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'memeorandum.com', 1.0),\n",
" (u'google.com', u'memeorandum.com', 1.0),\n",
" (u'kenyaunlimited.com', u'memeorandum.com', 1.0),\n",
" (u'blogia.com', u'mesvilaweb.cat', 1.0),\n",
" (u'classicalvalues.com', u'mesvilaweb.cat', 1.0),\n",
" (u'eltangoysusinvitados.com', u'mesvilaweb.cat', 1.0),\n",
" (u'google.com', u'mesvilaweb.cat', 1.0),\n",
" (u'kenyaunlimited.com', u'mesvilaweb.cat', 1.0),\n",
" (u'memeorandum.com', u'mesvilaweb.cat', 1.0),\n",
" (u'blogia.com', u'mu.nu', 1.0),\n",
" (u'classicalvalues.com', u'mu.nu', 1.0),\n",
" (u'eltangoysusinvitados.com', u'mu.nu', 1.0),\n",
" (u'google.com', u'mu.nu', 1.0),\n",
" (u'kenyaunlimited.com', u'mu.nu', 1.0),\n",
" (u'memeorandum.com', u'mu.nu', 1.0),\n",
" (u'mesvilaweb.cat', u'mu.nu', 1.0),\n",
" (u'blogia.com', u'pjmedia.com', 1.0),\n",
" (u'classicalvalues.com', u'pjmedia.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'pjmedia.com', 1.0),\n",
" (u'google.com', u'pjmedia.com', 1.0),\n",
" (u'kenyaunlimited.com', u'pjmedia.com', 1.0),\n",
" (u'memeorandum.com', u'pjmedia.com', 1.0),\n",
" (u'mesvilaweb.cat', u'pjmedia.com', 1.0),\n",
" (u'mu.nu', u'pjmedia.com', 1.0),\n",
" (u'blogia.com', u'sapo.pt', 1.0),\n",
" (u'classicalvalues.com', u'sapo.pt', 1.0),\n",
" (u'eltangoysusinvitados.com', u'sapo.pt', 1.0),\n",
" (u'google.com', u'sapo.pt', 1.0),\n",
" (u'kenyaunlimited.com', u'sapo.pt', 1.0),\n",
" (u'memeorandum.com', u'sapo.pt', 1.0),\n",
" (u'mesvilaweb.cat', u'sapo.pt', 1.0),\n",
" (u'mu.nu', u'sapo.pt', 1.0),\n",
" (u'pjmedia.com', u'sapo.pt', 1.0),\n",
" (u'qwika.com', u'wikidict.de', 1.0),\n",
" (u'blogia.com', u'zoomblog.com', 1.0),\n",
" (u'classicalvalues.com', u'zoomblog.com', 1.0),\n",
" (u'eltangoysusinvitados.com', u'zoomblog.com', 1.0),\n",
" (u'google.com', u'zoomblog.com', 1.0),\n",
" (u'kenyaunlimited.com', u'zoomblog.com', 1.0),\n",
" (u'memeorandum.com', u'zoomblog.com', 1.0),\n",
" (u'mesvilaweb.cat', u'zoomblog.com', 1.0),\n",
" (u'mu.nu', u'zoomblog.com', 1.0),\n",
" (u'pjmedia.com', u'zoomblog.com', 1.0),\n",
" (u'sapo.pt', u'zoomblog.com', 1.0),\n",
" (u'blogalaxia.com', u'blogia.com', 0.5),\n",
" (u'blogalaxia.com', u'classicalvalues.com', 0.5),\n",
" (u'blogalaxia.com', u'eltangoysusinvitados.com', 0.5),\n",
" (u'blogalaxia.com', u'google.com', 0.5),\n",
" (u'blogalaxia.com', u'kenyaunlimited.com', 0.5),\n",
" (u'blogalaxia.com', u'memeorandum.com', 0.5),\n",
" (u'blogalaxia.com', u'mesvilaweb.cat', 0.5),\n",
" (u'blogalaxia.com', u'mu.nu', 0.5),\n",
" (u'blogalaxia.com', u'pjmedia.com', 0.5),\n",
" (u'blogalaxia.com', u'sapo.pt', 0.5),\n",
" (u'wikipedia.org', u'wiktionary.org', 0.5),\n",
" (u'blogalaxia.com', u'zoomblog.com', 0.5),\n",
" (u'amazon.co.jp', u'amazon.de', 0.4444444444444444),\n",
" (u'blogia.com', u'typepad.com', 0.3333333333333333),\n",
" (u'classicalvalues.com', u'typepad.com', 0.3333333333333333),\n",
" (u'eltangoysusinvitados.com', u'typepad.com', 0.3333333333333333),\n",
" (u'google.com', u'typepad.com', 0.3333333333333333),\n",
" (u'kenyaunlimited.com', u'typepad.com', 0.3333333333333333),\n",
" (u'memeorandum.com', u'typepad.com', 0.3333333333333333),\n",
" (u'mesvilaweb.cat', u'typepad.com', 0.3333333333333333),\n",
" (u'mu.nu', u'typepad.com', 0.3333333333333333),\n",
" (u'pjmedia.com', u'typepad.com', 0.3333333333333333),\n",
" (u'sapo.pt', u'typepad.com', 0.3333333333333333),\n",
" (u'typepad.com', u'zoomblog.com', 0.3333333333333333),\n",
" (u'amazon.co.jp', u'amazon.com', 0.25),\n",
" (u'blogalaxia.com', u'typepad.com', 0.25),\n",
" (u'amazon.com', u'amazon.de', 0.20588235294117646),\n",
" (u'blogspot.com', u'wordpress.com', 0.17857142857142858),\n",
" (u'animationplayhouse.com', u'wordpress.com', 0.1),\n",
" (u'blogia.com', u'wordpress.com', 0.1),\n",
" (u'classicalvalues.com', u'wordpress.com', 0.1),\n",
" (u'eltangoysusinvitados.com', u'wordpress.com', 0.1),\n",
" (u'google.com', u'wordpress.com', 0.1),\n",
" (u'kenyaunlimited.com', u'wordpress.com', 0.1),\n",
" (u'memeorandum.com', u'wordpress.com', 0.1),\n",
" (u'mesvilaweb.cat', u'wordpress.com', 0.1),\n",
" (u'mu.nu', u'wordpress.com', 0.1),\n",
" (u'pjmedia.com', u'wordpress.com', 0.1),\n",
" (u'qwika.com', u'wordpress.com', 0.1),\n",
" (u'sapo.pt', u'wordpress.com', 0.1),\n",
" (u'wikidict.de', u'wordpress.com', 0.1),\n",
" (u'wordpress.com', u'zoomblog.com', 0.1),\n",
" (u'blogalaxia.com', u'wordpress.com', 0.09090909090909091),\n",
" (u'typepad.com', u'wordpress.com', 0.08333333333333333),\n",
" (u'blogspot.com', u'youtube.com', 0.08333333333333333),\n",
" (u'tumblr.com', u'wordpress.com', 0.0625),\n",
" (u'azspot.net', u'youtube.com', 0.0625),\n",
" (u'downthisvideo.com', u'youtube.com', 0.0625),\n",
" (u'flickr.com', u'youtube.com', 0.05555555555555555),\n",
" (u'animationplayhouse.com', u'blogspot.com', 0.043478260869565216),\n",
" (u'azspot.net', u'blogspot.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'downthisvideo.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'over-blog.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'qwika.com', 0.043478260869565216),\n",
" (u'blogspot.com', u'wikidict.de', 0.043478260869565216),\n",
" (u'blogspot.com', u'wiktionary.org', 0.043478260869565216),\n",
" (u'blogspot.com', u'wikipedia.org', 0.041666666666666664),\n",
" (u'blogspot.com', u'flickr.com', 0.04),\n",
" (u'amazon.com', u'img-dpreview.com', 0.04),\n",
" (u'blogspot.com', u'typepad.com', 0.04),\n",
" (u'blogspot.com', u'tumblr.com', 0.034482758620689655),\n",
" (u'amazon.com', u'blogspot.com', 0.02127659574468085)]"
]
}
],
"prompt_number": 12
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment