Skip to content

Instantly share code, notes, and snippets.

@skorokithakis
Created December 7, 2013 20:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save skorokithakis/d115ab734d9adbcf306f to your computer and use it in GitHub Desktop.
Save skorokithakis/d115ab734d9adbcf306f to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "Bloom filters"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": "from pybloom import BloomFilter\nimport os\nimport re",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": "posts = {post_name: open(\"posts/\" + post_name).read() for post_name in os.listdir(\"posts\")}\nsplit_posts = {name: set(re.split(\"\\W+\", contents.lower())) for name, contents in posts.items()}",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": "filters = {}\nfor name, words in split_posts.items():\n filters[name] = BloomFilter(capacity=1000, error_rate=0.01)\n for word in words:\n filters[name].add(word)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": "def search(search_string):\n results = []\n search_terms = re.split(\"\\W+\", search_string)\n for name, filter in filters.items():\n if all(term in filter for term in search_terms):\n results.append(name)\n return results",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": "search(\"python raspberry\")",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 29,
"text": "['2013-06-19 - how-remote-control-rf-devices-raspberry-pi.md',\n '2013-06-09 - how-turn-your-raspberry-pi-infrared-remote-control.md',\n '2013-06-24 - writing-my-first-android-app-control-your-raspberr.md']"
}
],
"prompt_number": 29
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment