Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bare-bones Cython tutorial for data scientists
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Bare-bones Cython for data scientists\n",
"\n",
"##### By David Taylor, www.prooffreader.com\n",
"\n",
"Of course, this tutorial could be of use to anyone, but I thought it particularly appropriate for data scientists, who often need to apply a single function to a long list of items. In this circumstance, having a significant time savings for that function can be worth the extra effort of using Cython.\n",
"\n",
"Cython is a version of Python (the regular version is called CPython, confusingly) that can be compiled before runtime. While one can take advantage of many features of Cython that are not in CPython (see [Cython's docs](http://docs.cythin.org/src/userguide/language_basics.html)), this tutorial will focus simply on declaring variable types.\n",
"\n",
"We will implement a function to determine the first 1000 prime numbers (taken from the basic tutorial at cython.org) in CPython, in Cython with identical code to CPython, and in Cython with typed variables."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. CPython\n",
"\n",
"This is ordinary Python code. I've used the ``time`` library to time the function instead of using profiling tools for simplicity."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import time\n",
"\n",
"def primes(kmax):\n",
" p = {}\n",
" result = []\n",
" if kmax > 10000:\n",
" kmax = 10000\n",
" k = 0\n",
" n = 2\n",
" while k < kmax:\n",
" i = 0\n",
" while i < k and n % p[i] != 0:\n",
" i += 1\n",
" if i == k:\n",
" p[k] = n\n",
" k += 1\n",
" result.append(n)\n",
" n += 1\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0995\n"
]
}
],
"source": [
"start = time.time()\n",
"x = primes(1000)\n",
"print(round(time.time() - start, 4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It takes about 100 milliseconds on a nice server."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Cython with CPython code\n",
"\n",
"We can take the above function definition verbatim, save it as a ``.pyx`` file, and then compile and run it with a few lines of code from the regular CPython interpreter. Just these few steps often gives a decent time savings: in this case, on my machine, it runs about 50% faster."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"with open('cpython_prime.pyx', 'w+') as f:\n",
" f.write(\"\"\"def primes(kmax):\n",
" p = {}\n",
" result = []\n",
" if kmax > 10000:\n",
" kmax = 10000\n",
" k = 0\n",
" n = 2\n",
" while k < kmax:\n",
" i = 0\n",
" while i < k and n % p[i] != 0:\n",
" i += 1\n",
" if i == k:\n",
" p[k] = n\n",
" k += 1\n",
" result.append(n)\n",
" n += 1\n",
" return result\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0704\n"
]
}
],
"source": [
"import pyximport\n",
"pyximport.install()\n",
"import cpython_prime # We import the .pyx file we created above\n",
"\n",
"start = time.time()\n",
"x = cpython_prime.primes(1000)\n",
"print(round(time.time() - start, 4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cython with typed variables\n",
"\n",
"Finally, we can modify the code with declared variable types: ``int`` and ``float`` most commonly, and you can declare an array (like a dict with numeric keys). There are plenty of other possibilities outlined on cython.org, including strings."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"with open('cython_prime.pyx', 'w+') as f:\n",
" f.write(\"\"\"def primes(int kmax):\n",
" cdef int n, k, i\n",
" cdef int p[10000]\n",
" result = []\n",
" if kmax > 10000:\n",
" kmax = 10000\n",
" k = 0\n",
" n = 2\n",
" while k < kmax:\n",
" i = 0\n",
" while i < k and n % p[i] != 0:\n",
" i = i + 1\n",
" if i == k:\n",
" p[k] = n\n",
" k = k + 1\n",
" result.append(n)\n",
" n = n + 1\n",
" return result\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0023\n"
]
}
],
"source": [
"import cython_prime\n",
"\n",
"start = time.time()\n",
"x = cython_prime.primes(1000)\n",
"print(round(time.time() - start, 4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YMMV, but on my system the function runs almost 50X faster! This time savings is certainly worth a few extra lines of code."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.