Skip to content

Instantly share code, notes, and snippets.

@bourque
Created May 22, 2017 18:37
Show Gist options
  • Save bourque/4f2d4ff7170e2655419a15f2a3ee5dda to your computer and use it in GitHub Desktop.
Save bourque/4f2d4ff7170e2655419a15f2a3ee5dda to your computer and use it in GitHub Desktop.
PyCon 2017 Functional Programming Tutorial
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using Functional Programming for efficient Data Processing and Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part I -- Intro to functional programming\n",
"\n",
"- flat data (e.g. CSV) and binary, data storage\n",
"- Unstructed binary (e.g. microsoft word)\n",
"- ETL - Extract, Transform, Load\n",
"- methods in classes can change the attribute of the class (disadvantage in eyes of functional programming)\n",
"- `lru_cache` -- least recently used cache\n",
"- functional programming uses a lot of generators and yeilds\n",
"- `itertools.islice` gets a piece of a generator\n",
"- `itertools.accumulate `"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from math import sqrt, pow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from functools import lru_cache\n",
"@lru_cache\n",
"def my_func():\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def make_rect(x, y):\n",
" return (x, y)\n",
"\n",
"def grow_rect(rect, amount):\n",
" return (rect[0] * amount, rect[1])\n",
"\n",
"def get_length (rect):\n",
" return rect[0]\n",
"\n",
"def get_area (rect):\n",
" return rect[0] * rect[1]"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def get_hyp(rectangle):\n",
" return sqrt(rectangle[0]**2 + rectangle[1]**2)\n",
" #return sqrt(sum(pow(r, 2) for r in rect)) # another solution\n",
"\n",
"def get_ratio(x, y, factor):\n",
" rect = make_rect(x, y)\n",
" big_rect = grow_rect(rect, factor)\n",
" z = get_hyp(rect)\n",
" h = get_hyp(big_rect)\n",
"\n",
" return z / h "
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.7905694150420948\n",
"0.6201736729460423\n",
"0.29637448918190956\n",
"0.6201736729460423\n"
]
}
],
"source": [
"print(get_ratio(1, 2, 2))\n",
"print(get_ratio(1, 2, 3))\n",
"print(get_ratio(3, 2, 4))\n",
"print(get_ratio(1, 2, 3))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part II -- You might not need Pandas\n",
"\n",
"- Can read in CSV with builtin `csv` and `io` libraries\n",
"- Excel data can be read by `xlrd`\n",
"- Can sort dictionary items with `operator.itemgetter`, e.g. `_sorted = sorted(records, key=itemgetter('amount')`\n",
"- `meza`: \"A lightweight version of pandas\" - the author himself \n",
"- `pandas` is objects/methods whereas `meza` provides functions"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting meza\n",
" Downloading meza-0.37.0-py2.py3-none-any.whl (63kB)\n",
"\u001b[K 100% |████████████████████████████████| 71kB 282kB/s \n",
"\u001b[?25hRequirement already satisfied: PyYAML<4.0.0,>=3.11 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza)\n",
"Requirement already satisfied: beautifulsoup4<5.0.0,>=4.4.1 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza)\n",
"Collecting pygogo<0.11.0,>=0.10.0 (from meza)\n",
" Downloading pygogo-0.10.0-py2.py3-none-any.whl\n",
"Requirement already satisfied: requests<3.0.0,>=2.10.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza)\n",
"Requirement already satisfied: xlrd<0.10.0,>=0.9.3 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza)\n",
"Requirement already satisfied: python-dateutil<3.0.0,>=2.4.2 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza)\n",
"Collecting ijson<3.0.0,>=2.2 (from meza)\n",
" Downloading ijson-2.3-py2.py3-none-any.whl\n",
"Requirement already satisfied: chardet<3.0.0,>=2.3.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza)\n",
"Collecting dbfread==2.0.4 (from meza)\n",
" Downloading dbfread-2.0.4-py2.py3-none-any.whl\n",
"Requirement already satisfied: six<2.0.0,>=1.10.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza)\n",
"Collecting python-slugify<0.1.0,>=0.0.7 (from meza)\n",
" Downloading python-slugify-0.0.9.tar.gz\n",
"Collecting future~=0.15.2 (from pygogo<0.11.0,>=0.10.0->meza)\n",
" Downloading future-0.15.2.tar.gz (1.6MB)\n",
"\u001b[K 100% |████████████████████████████████| 1.6MB 108kB/s \n",
"\u001b[?25hCollecting Unidecode>=0.04.16 (from python-slugify<0.1.0,>=0.0.7->meza)\n",
" Downloading Unidecode-0.04.20-py2.py3-none-any.whl (228kB)\n",
"\u001b[K 100% |████████████████████████████████| 235kB 100kB/s \n",
"\u001b[?25hBuilding wheels for collected packages: python-slugify, future\n",
" Running setup.py bdist_wheel for python-slugify ... \u001b[?25l-\b \b\\\b \b|\b \b/\b \b-\b \bdone\n",
"\u001b[?25h Stored in directory: /Users/bourque/Library/Caches/pip/wheels/24/78/11/146ac473f51343795d182be2e51ea7201d56191fc3dc75df88\n",
" Running setup.py bdist_wheel for future ... \u001b[?25l-\b \b\\\b \b|\b \bdone\n",
"\u001b[?25h Stored in directory: /Users/bourque/Library/Caches/pip/wheels/11/c5/d2/ad287de27d0f0d646f119dcffb921f4e63df128f28ab0a1bda\n",
"Successfully built python-slugify future\n",
"Installing collected packages: future, pygogo, ijson, dbfread, Unidecode, python-slugify, meza\n",
"Successfully installed Unidecode-0.4.20 dbfread-2.0.4 future-0.15.2 ijson-2.3 meza-0.37.0 pygogo-0.10.0 python-slugify-0.0.9\n"
]
}
],
"source": [
"!pip install meza"
]
},
{
"cell_type": "code",
"execution_count": 242,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from statistics import median\n",
"from meza.process import aggregate\n",
"from meza.process import group\n",
"from meza.convert import records2csv\n",
"from meza.io import write\n",
"from meza import convert as cv\n",
"from meza import process as pr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### My solution"
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"records = [{'factor' : f, 'length': 2, 'width': 2, 'ratio' : get_ratio(2, 2, f)} for f in range(1,21)]"
]
},
{
"cell_type": "code",
"execution_count": 234,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"groups = group(records, lambda record: record['ratio'] // .25)"
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "SyntaxError",
"evalue": "unexpected EOF while parsing (<ipython-input-235-9de82e3b0369>, line 2)",
"output_type": "error",
"traceback": [
"\u001b[0;36m File \u001b[0;32m\"<ipython-input-235-9de82e3b0369>\"\u001b[0;36m, line \u001b[0;32m2\u001b[0m\n\u001b[0;31m results.append((next_group[0]), (aggregate(next_group[1], 'ratio', median))\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m unexpected EOF while parsing\n"
]
}
],
"source": [
"aggregate(next_group[1], 'ratio', median)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### His solution"
]
},
{
"cell_type": "code",
"execution_count": 243,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def aggregator(group):\n",
" ratios = (g['ratio'] for g in group)\n",
" return median(ratios)"
]
},
{
"cell_type": "code",
"execution_count": 245,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "TypeError",
"evalue": "group() argument after ** must be a mapping, not set",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-245-5f05d8bd39c2>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mkwargs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m'aggregator'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maggregator\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mgkeyfunc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'ratio'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m//\u001b[0m \u001b[0;36m.25\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mgroups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrecords\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgkeyfunc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: group() argument after ** must be a mapping, not set"
]
}
],
"source": [
"kwargs = {'aggregator', aggregator}\n",
"gkeyfunc = lambda r: r['ratio'] // .25\n",
"groups = pr.group(records, gkeyfunc, **kwargs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"results = [{'key' : k, }]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part III -- `riko`\n",
"\n",
"- A stream processinng engine modeled after Yahoo! pipes"
]
},
{
"cell_type": "code",
"execution_count": 247,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting riko\n",
" Downloading riko-0.51.0-py2.py3-none-any.whl (700kB)\n",
"\u001b[K 100% |████████████████████████████████| 706kB 215kB/s \n",
"\u001b[?25hCollecting Mezmorize<0.19.0,>=0.18.2 (from riko)\n",
" Downloading mezmorize-0.18.2-py2.py3-none-any.whl\n",
"Requirement already satisfied: meza<0.38.0,>=0.37.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from riko)\n",
"Requirement already satisfied: chardet<3.0.0,>=2.3.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from riko)\n",
"Requirement already satisfied: pygogo<0.11.0,>=0.9.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from riko)\n",
"Requirement already satisfied: python-dateutil<3.0.0,>=2.4.2 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from riko)\n",
"Collecting Babel<3.0.0,>=2.3.4 (from riko)\n",
" Downloading Babel-2.4.0-py2.py3-none-any.whl (6.8MB)\n",
"\u001b[K 100% |████████████████████████████████| 6.8MB 49kB/s \n",
"\u001b[?25hCollecting feedparser<6.0.0,>=5.2.1 (from riko)\n",
" Downloading feedparser-5.2.1.zip (1.2MB)\n",
"\u001b[K 100% |████████████████████████████████| 1.2MB 197kB/s \n",
"\u001b[?25hCollecting html5lib==0.999999999 (from riko)\n",
" Downloading html5lib-0.999999999-py2.py3-none-any.whl (112kB)\n",
"\u001b[K 100% |████████████████████████████████| 122kB 315kB/s \n",
"\u001b[?25hCollecting pytz>=2016.10 (from riko)\n",
" Downloading pytz-2017.2-py2.py3-none-any.whl (484kB)\n",
"\u001b[K 100% |████████████████████████████████| 491kB 213kB/s \n",
"\u001b[?25hRequirement already satisfied: six<2.0.0,>=1.10.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from riko)\n",
"Requirement already satisfied: requests<3.0.0,>=2.10.0 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from riko)\n",
"Collecting werkzeug<=0.13.0,>=0.12.1 (from Mezmorize<0.19.0,>=0.18.2->riko)\n",
" Downloading Werkzeug-0.12.2-py2.py3-none-any.whl (312kB)\n",
"\u001b[K 100% |████████████████████████████████| 317kB 252kB/s \n",
"\u001b[?25hRequirement already satisfied: xlrd<0.10.0,>=0.9.3 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza<0.38.0,>=0.37.0->riko)\n",
"Requirement already satisfied: dbfread==2.0.4 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza<0.38.0,>=0.37.0->riko)\n",
"Requirement already satisfied: python-slugify<0.1.0,>=0.0.7 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza<0.38.0,>=0.37.0->riko)\n",
"Requirement already satisfied: PyYAML<4.0.0,>=3.11 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza<0.38.0,>=0.37.0->riko)\n",
"Requirement already satisfied: beautifulsoup4<5.0.0,>=4.4.1 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza<0.38.0,>=0.37.0->riko)\n",
"Requirement already satisfied: ijson<3.0.0,>=2.2 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from meza<0.38.0,>=0.37.0->riko)\n",
"Requirement already satisfied: future~=0.15.2 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from pygogo<0.11.0,>=0.9.0->riko)\n",
"Collecting webencodings (from html5lib==0.999999999->riko)\n",
" Downloading webencodings-0.5.1-py2.py3-none-any.whl\n",
"Requirement already satisfied: setuptools>=18.5 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages/setuptools-20.7.0-py3.5.egg (from html5lib==0.999999999->riko)\n",
"Requirement already satisfied: Unidecode>=0.04.16 in /Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages (from python-slugify<0.1.0,>=0.0.7->meza<0.38.0,>=0.37.0->riko)\n",
"Building wheels for collected packages: feedparser\n",
" Running setup.py bdist_wheel for feedparser ... \u001b[?25l-\b \b\\\b \b|\b \b/\b \b-\b \b\\\b \bdone\n",
"\u001b[?25h Stored in directory: /Users/bourque/Library/Caches/pip/wheels/15/ce/10/b500f745822ea6db6ea8ed225c06b15c000d71016b89ef9037\n",
"Successfully built feedparser\n",
"Installing collected packages: werkzeug, Mezmorize, pytz, Babel, feedparser, webencodings, html5lib, riko\n",
" Found existing installation: Werkzeug 0.11.11\n",
"\u001b[31m DEPRECATION: Uninstalling a distutils installed project (werkzeug) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.\u001b[0m\n",
" Uninstalling Werkzeug-0.11.11:\n",
" Successfully uninstalled Werkzeug-0.11.11\n",
" Found existing installation: pytz 2016.3\n",
"\u001b[31m DEPRECATION: Uninstalling a distutils installed project (pytz) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.\u001b[0m\n",
" Uninstalling pytz-2016.3:\n",
" Successfully uninstalled pytz-2016.3\n",
" Found existing installation: Babel 2.3.3\n",
" Uninstalling Babel-2.3.3:\n",
" Successfully uninstalled Babel-2.3.3\n",
" Found existing installation: html5lib 0.999\n",
"\u001b[31m DEPRECATION: Uninstalling a distutils installed project (html5lib) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.\u001b[0m\n",
" Uninstalling html5lib-0.999:\n",
" Successfully uninstalled html5lib-0.999\n",
"Successfully installed Babel-2.4.0 Mezmorize-0.18.2 feedparser-5.2.1 html5lib-0.999999999 pytz-2017.2 riko-0.51.0 webencodings-0.5.1 werkzeug-0.12.2\n"
]
}
],
"source": [
"!pip install riko"
]
},
{
"cell_type": "code",
"execution_count": 332,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from riko.collections import SyncPipe\n",
"from riko.modules import fetch, fetchsitefeed"
]
},
{
"cell_type": "code",
"execution_count": 250,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"url = 'www.python.org/events/python-events/'\n",
"_xpath = '/html/body/div/div[3]/div/section'\n",
"xpath = '{}/div/div/ul/li'.format(_xpath)\n",
"xconf = {'url': url, 'xpath': xpath}\n",
"kwargs = {'emit': False, 'token_key': None}\n",
"epath = 'h3.a.content'\n",
"lpath = 'p.span.content'\n",
"rrule = [{'field': 'h3'}, {'field': 'p'}]\n",
"# >>> flow = (\n",
"# ... SyncPipe('xpathfetchpage', conf=xconf)\n",
"# ... .subelement(\n",
"# ... conf={'path': epath},\n",
"# ... assign='event', **kwargs)\n",
"# ... .subelement(\n",
"# ... conf={'path': lpath},\n",
"# ... assign='location', **kwargs)\n",
"# ... .rename(conf={'rule': rrule}))"
]
},
{
"cell_type": "code",
"execution_count": 327,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"url = 'www.python.org/events/python-events/'\n",
"url = 'https://www.python.org/jobs/feed/rss'\n",
"_xpath = '/html/body/div/div[3]/div/section'\n",
"xpath = '{}/div/div/ul/li'.format(_xpath)\n",
"xconf = {'url': url}\n",
"kwargs = {'emit': False, 'token_key': '\\n'}\n",
"epath = 'h3.a.content'\n",
"lpath = 'p.span.content'\n",
"rrule = [{'field': 'summary'}, {'field': 'p'}]"
]
},
{
"cell_type": "code",
"execution_count": 328,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"flow = (SyncPipe('xpathfetchpage', conf=xconf)\n",
" .subelement(conf={'path': epath}, assign='description', **kwargs)\n",
" .subelement(conf={'path': lpath}, assign='description', **kwargs)\n",
" .rename(conf={'rule':rrule}))"
]
},
{
"cell_type": "code",
"execution_count": 329,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"stream = flow.output"
]
},
{
"cell_type": "code",
"execution_count": 330,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "TypeError",
"evalue": "Argument must be bytes or unicode, got 'NoneType'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-330-2799681e0bef>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstream\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages/riko/modules/__init__.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(item, **kwargs)\u001b[0m\n\u001b[1;32m 349\u001b[0m \u001b[0mstream\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32myield\u001b[0m \u001b[0mpipe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0mparsed\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 350\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 351\u001b[0;31m \u001b[0mstream\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpipe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0mparsed\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 352\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 353\u001b[0m \u001b[0mone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0massignment\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_assignment\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstream\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mskip\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mskip\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mcombined\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages/riko/modules/xpathfetchpage.py\u001b[0m in \u001b[0;36mpipe\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 282\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 283\u001b[0m \"\"\"\n\u001b[0;32m--> 284\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages/riko/modules/xpathfetchpage.py\u001b[0m in \u001b[0;36mparser\u001b[0;34m(_, objconf, skip, **kwargs)\u001b[0m\n\u001b[1;32m 176\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mfetch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mobjconf\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 177\u001b[0m \u001b[0mroot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxml2etree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxml\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mxml\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhtml5\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mobjconf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhtml5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetroot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 178\u001b[0;31m \u001b[0melements\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxpath\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mroot\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mobjconf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mxpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 179\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 180\u001b[0m \u001b[0mitems\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0metree2dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0melements\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages/riko/parsers.py\u001b[0m in \u001b[0;36mxpath\u001b[0;34m(tree, path, pos, namespace)\u001b[0m\n\u001b[1;32m 124\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mxpath\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtree\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'/'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpos\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnamespace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 125\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 126\u001b[0;31m \u001b[0melements\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mxpath\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 127\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mAttributeError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 128\u001b[0m \u001b[0mstripped\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlstrip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'/'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32msrc/lxml/lxml.etree.pyx\u001b[0m in \u001b[0;36mlxml.etree._Element.xpath (src/lxml/lxml.etree.c:57924)\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32msrc/lxml/xpath.pxi\u001b[0m in \u001b[0;36mlxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:166960)\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32msrc/lxml/apihelpers.pxi\u001b[0m in \u001b[0;36mlxml.etree._utf8 (src/lxml/lxml.etree.c:30198)\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: Argument must be bytes or unicode, got 'NoneType'"
]
}
],
"source": [
"next(stream)"
]
},
{
"cell_type": "code",
"execution_count": 342,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<generator object pipe at 0x10ab255c8>\n"
]
}
],
"source": [
"url = 'https://www.python.org/jobs/feed/rss'\n",
"stream = fetch.pipe(conf={'url': 'url'})\n",
"print(stream)\n",
"fetch_conf = {'url': url, 'start': '<body>', 'end': '</body>', 'detag': True}\n",
"replace_conf = {'rule': [{'find': 'description', 'replace': '\\n'}]}\n",
"flow = (SyncPipe('fetchpage', conf=fetch_conf) # 2\n",
" .strreplace(conf=replace_conf, assign='content') # 3\n",
" .tokenizer(conf={'delimiter': ' '}, emit=True) # 4\n",
" .count(conf={'count_key': 'content'})) "
]
},
{
"cell_type": "code",
"execution_count": 343,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"stream = flow.output"
]
},
{
"cell_type": "code",
"execution_count": 344,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "StopIteration",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-344-2799681e0bef>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstream\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages/riko/modules/__init__.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(items, **kwargs)\u001b[0m\n\u001b[1;32m 626\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 627\u001b[0m \u001b[0;31m# operators can only assign one value per item and can't skip items\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 628\u001b[0;31m \u001b[0m_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0massignment\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_assignment\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstream\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mcombined\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 629\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 630\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcombined\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'emit'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Users/bourque/anaconda2/envs/astroconda3/lib/python3.5/site-packages/riko/modules/__init__.py\u001b[0m in \u001b[0;36mget_assignment\u001b[0;34m(result, skip, **kwargs)\u001b[0m\n\u001b[1;32m 98\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 99\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 100\u001b[0;31m \u001b[0mfirst_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 101\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 102\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mStopIteration\u001b[0m: "
]
}
],
"source": [
"next(stream)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda env:astroconda3]",
"language": "python",
"name": "conda-env-astroconda3-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment