arogozhnikov/SpeedBenchmarks.ipynb

## SpeedBenchmarks.ipynb
{"nbformat_minor": 0, "cells": [{"source": "## Log likelihood benchmark\n\ntested: \n0. numpy + scipy's pdf\n1. numpy\n2. cython\n4. numexpr\n3. theano\n4. parakeet\n\ncomputing log-likelihood for normal distribution \n\n__Notes__\n1. Not optimizing computations here (but in theory theano and parakeet may remove unnecessary computations)\n2. This test includes exp, log, division and summation of array\n2. Everything is running on CPU in one thread, this limitation is for 'fairness' of tests\n", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "import numpy\nimport scipy\nfrom scipy.stats import norm\nimport theano", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "### Scipy implementation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", "source": "def llh_scipy(data, mean, sigma):\n    lh = norm(mean, sigma).pdf(data)\n    return numpy.log(lh).sum()", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "### Numpy implementation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "def llh_numpy(data, mean, sigma):\n    s = (data - mean) ** 2 / (2 * (sigma ** 2))\n    pdfs = numpy.exp(- s)\n    pdfs /= numpy.sqrt(2 * numpy.pi) * sigma\n    return numpy.log(pdfs).sum()", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "### Cython implementation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, "cell_type": "code", "source": "%load_ext Cython", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 5, "cell_type": "code", "source": "%%cython\ncdef extern from \"math.h\":\n    double sqrt(double m)\n    double exp(double m)\n    double log(double m)\n\nimport cython\nimport numpy as np\n\ncimport numpy as np\nfrom numpy cimport ndarray\npi = np.pi\n\n@cython.boundscheck(False)\n@cython.wraparound(False)\ndef llh_cython(ndarray[np.float64_t, ndim=1] data, double mean, double sigma):\n    cdef int l = len(data)\n    cdef double llh = 0\n    cdef double s = 0\n    for i in xrange(l):\n        s = (data[i] - mean) ** 2 / (2 * (sigma ** 2))\n        llh += log(exp(-s) / (sqrt(2 * pi) * sigma))\n    return llh", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "### Numexpr implementation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 6, "cell_type": "code", "source": "import numexpr\nnumexpr.set_num_threads(1)\n\ndef llh_numexpr(data, mean, sigma):\n    expression = 'sum(log(exp(- (data-mean) **2  / (2 * sigma ** 2)) / (sqrt(2 * pi) * sigma)))'\n    return numexpr.evaluate(expression, local_dict=dict(data=data, mean=mean, sigma=sigma, pi=numpy.pi))", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "### Parakeet implementation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 7, "cell_type": "code", "source": "import parakeet\nparakeet.config.backend = 'c'\n\n@parakeet.jit \ndef llh_parakeet(data, mean, sigma):\n    s = (data - mean) ** 2 / (2 * (sigma ** 2))\n    pdfs = numpy.exp(- s)\n    pdfs /= numpy.sqrt(2 * numpy.pi) * sigma\n    return numpy.log(pdfs).sum()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "### Theano implementation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 8, "cell_type": "code", "source": "import theano\nimport theano.tensor as T\n\nprint theano.config.device", "outputs": [{"output_type": "stream", "name": "stdout", "text": "cpu\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 9, "cell_type": "code", "source": "theano.config.openmp", "outputs": [{"execution_count": 9, "output_type": "execute_result", "data": {"text/plain": "False"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 10, "cell_type": "code", "source": "def llh_theano(data, mean, sigma):\n    s = (data - mean) ** 2 / (2 * (sigma ** 2))\n    pdfs = T.exp(- s)\n    pdfs /= T.sqrt(2 * numpy.pi) * sigma\n    return T.log(pdfs).sum()\n\nmean, sigma = T.scalars('m', 's')\nd = T.vector('data')\n\nllh_theano = theano.function([d, mean, sigma], llh_theano(d, mean, sigma))", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## C++ implementation for comparison of speed\nwe are neither passing, nor returning anything in c++. Just doing same operations in C++ for some array to compare speed.\n\nMind the overhead for creating new process - it is essential for small sizes.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 11, "cell_type": "code", "source": "%%writefile test_speed.cpp\n#include <iostream>\n#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\n// using namespace std;\n\n\nint main(int argc, char** argv) {\n    if (argc < 4){\n        std::cout << \"run with n_samples, mean, sigma!\";\n        return 1;\n    }\n    int size = atoi(argv[1]);\n    double mean = atof(argv[2]);\n    double sigma = atof(argv[3]);\n    \n    double * data = new double[size];\n    double factor = 1. / size;\n    for (int i=0; i<size; ++i){\n        data[i] = i * factor;\n    }\n    double result = 0.;\n    double s = 0.;\n    double x = 0.;\n    \n    for (int i=0; i<size; ++i){\n        x = (data[i] - mean);\n        s =  x * x / (2 * (sigma * sigma));\n        result += log(exp(-s) / (sqrt(2 * M_PI) * sigma));\n    }\n    \n    std::cout << std::endl << result << std::endl;\n    return 0;\n}", "outputs": [{"output_type": "stream", "name": "stdout", "text": "Overwriting test_speed.cpp\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 12, "cell_type": "code", "source": "!g++ test_speed.cpp -o test_speed -O3", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 13, "cell_type": "code", "source": "def llh_cpp(data, mean, sigma):\n    size = len(data)\n    out = !./test_speed {len(data)} {mean} {sigma}", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "### Data generation, checking that all functions output the same value", "cell_type": "markdown", "metadata": {}}, {"execution_count": 14, "cell_type": "code", "source": "from collections import OrderedDict\nfunctions = OrderedDict()\nfunctions['scipy'] = llh_scipy\nfunctions['numpy'] = llh_numpy\nfunctions['cython'] = llh_cython\nfunctions['numexpr'] = llh_numexpr\nfunctions['parakeet'] = llh_parakeet\nfunctions['theano'] = llh_theano\nfunctions['c++'] = llh_cpp", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 15, "cell_type": "code", "source": "data = numpy.random.normal(size=1000000).astype('float64')\nfor name, function in functions.items():\n    print name, function(data, 0.1, 1.1)", "outputs": [{"output_type": "stream", "name": "stdout", "text": "scipy -1431620.79845\nnumpy -1431620.79845\ncython -1431620.79845\nnumexpr -1431620.79845\nparakeet -1431620.79845\ntheano -1431620.79845\nc++ None\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 16, "cell_type": "code", "source": "import timeit \nsizes = [10 ** 5, 10 ** 6, 10 ** 7, 10 ** 8]\nimport pandas\nscores = pandas.DataFrame(data=0, columns=functions.keys(), index=sizes)\nfor size in sizes:\n    for name, function in functions.items():\n        data = numpy.random.normal(size=size).astype('float64')\n        result = %timeit -o function(data, 0.1, 1.1)\n        scores.loc[size, name] = result.best", "outputs": [{"output_type": "stream", "name": "stdout", "text": "100 loops, best of 3: 10.9 ms per loop\n100 loops, best of 3: 5.97 ms per loop\n100 loops, best of 3: 10.6 ms per loop\n100 loops, best of 3: 7.01 ms per loop\n100 loops, best of 3: 6.32 ms per loop\n100 loops, best of 3: 7.7 ms per loop\n100 loops, best of 3: 17 ms per loop\n10 loops, best of 3: 131 ms per loop\n10 loops, best of 3: 62.1 ms per loop\n10 loops, best of 3: 109 ms per loop\n10 loops, best of 3: 72.6 ms per loop\n10 loops, best of 3: 64.5 ms per loop\n10 loops, best of 3: 80 ms per loop\n10 loops, best of 3: 79.8 ms per loop\n1 loops, best of 3: 1.8 s per loop\n1 loops, best of 3: 840 ms per loop\n1 loops, best of 3: 1.07 s per loop\n1 loops, best of 3: 700 ms per loop\n1 loops, best of 3: 628 ms per loop\n1 loops, best of 3: 822 ms per loop\n1 loops, best of 3: 697 ms per loop\n1 loops, best of 3: 18.9 s per loop\n1 loops, best of 3: 8.54 s per loop\n1 loops, best of 3: 10.9 s per loop\n1 loops, best of 3: 7.16 s per loop\n1 loops, best of 3: 6.25 s per loop\n1 loops, best of 3: 8.23 s per loop\n1 loops, best of 3: 6.92 s per loop\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## Results (time in seconds, less is better) ", "cell_type": "markdown", "metadata": {}}, {"execution_count": 17, "cell_type": "code", "source": "scores", "outputs": [{"execution_count": 17, "output_type": "execute_result", "data": {"text/plain": "               scipy     numpy     cython   numexpr  parakeet    theano  \\\n100000      0.010891  0.005968   0.010637  0.007012  0.006323  0.007702   \n1000000     0.130639  0.062146   0.108749  0.072568  0.064520  0.079974   \n10000000    1.795048  0.840145   1.067230  0.700086  0.627737  0.821548   \n100000000  18.917289  8.539139  10.910751  7.157359  6.250242  8.229951   \n\n                c++  \n100000     0.017023  \n1000000    0.079771  \n10000000   0.696519  \n100000000  6.924702  ", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>scipy</th>\n      <th>numpy</th>\n      <th>cython</th>\n      <th>numexpr</th>\n      <th>parakeet</th>\n      <th>theano</th>\n      <th>c++</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>100000   </th>\n      <td>  0.010891</td>\n      <td> 0.005968</td>\n      <td>  0.010637</td>\n      <td> 0.007012</td>\n      <td> 0.006323</td>\n      <td> 0.007702</td>\n      <td> 0.017023</td>\n    </tr>\n    <tr>\n      <th>1000000  </th>\n      <td>  0.130639</td>\n      <td> 0.062146</td>\n      <td>  0.108749</td>\n      <td> 0.072568</td>\n      <td> 0.064520</td>\n      <td> 0.079974</td>\n      <td> 0.079771</td>\n    </tr>\n    <tr>\n      <th>10000000 </th>\n      <td>  1.795048</td>\n      <td> 0.840145</td>\n      <td>  1.067230</td>\n      <td> 0.700086</td>\n      <td> 0.627737</td>\n      <td> 0.821548</td>\n      <td> 0.696519</td>\n    </tr>\n    <tr>\n      <th>100000000</th>\n      <td> 18.917289</td>\n      <td> 8.539139</td>\n      <td> 10.910751</td>\n      <td> 7.157359</td>\n      <td> 6.250242</td>\n      <td> 8.229951</td>\n      <td> 6.924702</td>\n    </tr>\n  </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## Comparison to numpy time (less is better)", "cell_type": "markdown", "metadata": {}}, {"execution_count": 18, "cell_type": "code", "source": "normalized_scores = scores.copy()\nfor column in normalized_scores.columns:\n    normalized_scores[column] /= scores['numpy']    ", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 19, "cell_type": "code", "source": "normalized_scores", "outputs": [{"execution_count": 19, "output_type": "execute_result", "data": {"text/plain": "              scipy  numpy    cython   numexpr  parakeet    theano       c++\n100000     1.824860      1  1.782218  1.174857  1.059506  1.290531  2.852221\n1000000    2.102137      1  1.749897  1.167704  1.038194  1.286867  1.283616\n10000000   2.136593      1  1.270292  0.833292  0.747177  0.977864  0.829046\n100000000  2.215363      1  1.277734  0.838183  0.731952  0.963792  0.810937", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>scipy</th>\n      <th>numpy</th>\n      <th>cython</th>\n      <th>numexpr</th>\n      <th>parakeet</th>\n      <th>theano</th>\n      <th>c++</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>100000   </th>\n      <td> 1.824860</td>\n      <td> 1</td>\n      <td> 1.782218</td>\n      <td> 1.174857</td>\n      <td> 1.059506</td>\n      <td> 1.290531</td>\n      <td> 2.852221</td>\n    </tr>\n    <tr>\n      <th>1000000  </th>\n      <td> 2.102137</td>\n      <td> 1</td>\n      <td> 1.749897</td>\n      <td> 1.167704</td>\n      <td> 1.038194</td>\n      <td> 1.286867</td>\n      <td> 1.283616</td>\n    </tr>\n    <tr>\n      <th>10000000 </th>\n      <td> 2.136593</td>\n      <td> 1</td>\n      <td> 1.270292</td>\n      <td> 0.833292</td>\n      <td> 0.747177</td>\n      <td> 0.977864</td>\n      <td> 0.829046</td>\n    </tr>\n    <tr>\n      <th>100000000</th>\n      <td> 2.215363</td>\n      <td> 1</td>\n      <td> 1.277734</td>\n      <td> 0.838183</td>\n      <td> 0.731952</td>\n      <td> 0.963792</td>\n      <td> 0.810937</td>\n    </tr>\n  </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## Conclusion\n\nMany libraries claim that they can speed up number crunching in python.\nResults of this test are floating (+- 0.1), but what we can see\n1. numpy turned out to be fastest at moderate sizes of arrays\n2. numpy implementation at least not more complex than others\n3. parakeet was the only to get sensible speed up", "cell_type": "markdown", "metadata": {}}, {"source": "# Technical info", "cell_type": "markdown", "metadata": {}}, {"execution_count": 20, "cell_type": "code", "source": "import multiprocessing\nprint multiprocessing.cpu_count(), 'xeon cores'", "outputs": [{"output_type": "stream", "name": "stdout", "text": "16 xeon cores\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 21, "cell_type": "code", "source": "numpy.__version__", "outputs": [{"execution_count": 21, "output_type": "execute_result", "data": {"text/plain": "'1.9.2'"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 22, "cell_type": "code", "source": "scipy.__version__", "outputs": [{"execution_count": 22, "output_type": "execute_result", "data": {"text/plain": "'0.14.0'"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 23, "cell_type": "code", "source": "import cython\ncython.__version__", "outputs": [{"execution_count": 23, "output_type": "execute_result", "data": {"text/plain": "'0.21.1'"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 24, "cell_type": "code", "source": "numexpr.__version__", "outputs": [{"execution_count": 24, "output_type": "execute_result", "data": {"text/plain": "'2.4'"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 25, "cell_type": "code", "source": "parakeet.__version__", "outputs": [{"execution_count": 25, "output_type": "execute_result", "data": {"text/plain": "'0.23.2'"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 26, "cell_type": "code", "source": "theano.__version__", "outputs": [{"execution_count": 26, "output_type": "execute_result", "data": {"text/plain": "'0.7.0'"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 27, "cell_type": "code", "source": "!g++ -v", "outputs": [{"output_type": "stream", "name": "stdout", "text": "Using built-in specs.\r\nCOLLECT_GCC=g++\r\nCOLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper\r\nTarget: x86_64-linux-gnu\r\nConfigured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu\r\nThread model: posix\r\ngcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) \r\n"}], "metadata": {"collapsed": false, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.3", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}