Skip to content

Instantly share code, notes, and snippets.

@moble
Created November 16, 2017 19:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save moble/119c740e204a31dd0920ba592f465b4d to your computer and use it in GitHub Desktop.
Save moble/119c740e204a31dd0920ba592f465b4d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Speed of matrix multiplications\n",
"\n",
"Alex, Hugo, and I discussed why [`clifford`](https://github.com/arsenovic/clifford) is slow, and they suggested that it's because of the matrix multiplications that `clifford` uses to perform geometric products, like [this piece of code](https://github.com/arsenovic/clifford/blob/master/clifford/__init__.py#L725). Basically, any multivector in $n$ dimensions is represented as an array of the $2^n$ possible coefficients of the various basis blades, so any product is represented by an array of dimension $2^n$x$2^n$x$2^n$ where we contract the two multivectors on two of the indices, and the remaining index ranges over the $2^n$ coefficients of the output multivector.\n",
"\n",
"Numpy has various options for the backend libraries that do matrix multiplication. There's a nice overview of this [here](http://markus-beuckelmann.de/blog/boosting-numpy-blas.html), which also shows some benchmarks. The takeaway is that MKL is probably best if you have an Intel chip, and OpenBLAS is probably best otherwise. You can see which libraries are linked with your numpy installation by running `numpy.show_config()`. There are various ways to install these. I think `conda` automatically installs the best one, but I'm not sure; I know that `conda` gives you a choice.\n",
"\n",
"Here, I'll try a few numpy functions that can do the matrix multiplication in question and show their timings. The upshot is that each multiplication can be done a bit more efficiently using python's `@` operator, but in any case will take tens of microseconds for $n=5$. Obviously, the sparse bitwise representation discussed by [Dorst et al.](http://www.geometricalgebra.net/) will be better; I would expect typical results (depending on sparseness) tens to hundreds of times faster."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2017-11-16T19:05:59.739813Z",
"start_time": "2017-11-16T19:05:59.520914Z"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"np.random.seed(1234)\n",
"\n",
"def setup_arrays(n_dimensions):\n",
" size = 2**n_dimensions\n",
" return (\n",
" np.random.random((1, size)), # Random multivector\n",
" np.random.choice([-1, 0, 1], (size, size, size)), # Array with elements randomly chosen from [-1, 0, 1]\n",
" np.random.random((size, 1)) # Random multivector\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, I'll check to make sure that my methods of computing the result all give the same answer. Since each term in the output is the result of $\\sim 4^5 \\approx 1000$ operations, I'll demand that they're the same to about 1000 times machine precision:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2017-11-16T19:08:54.241136Z",
"start_time": "2017-11-16T19:08:54.233203Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a, mt, b = setup_arrays(5)\n",
"\n",
"dot = np.dot(a, np.dot(mt, b)).flatten() # This is the current implementation in the code\n",
"einsum = np.einsum('ij,ljk,ki', a, mt, b) # This is the function I suggested on the phone call\n",
"at = (a @ mt @ b).flatten() # This also works\n",
"\n",
"np.allclose(dot, einsum, rtol=2e-13, atol=0) and np.allclose(dot, at, rtol=2e-13, atol=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks good; they all agree to within the tolerances.\n",
"\n",
"Now let's see how long each one takes:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2017-11-16T19:06:05.117215Z",
"start_time": "2017-11-16T19:06:00.354170Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"58.5 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%%timeit a, mt, b = setup_arrays(5)\n",
"\n",
"np.dot(a, np.dot(mt, b))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2017-11-16T19:06:09.897928Z",
"start_time": "2017-11-16T19:06:05.118990Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"58.7 µs ± 2.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%%timeit a, mt, b = setup_arrays(5)\n",
"\n",
"np.einsum('ij,ljk,ki', a, mt, b)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2017-11-16T19:06:13.278976Z",
"start_time": "2017-11-16T19:06:09.899782Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"41.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%%timeit a, mt, b = setup_arrays(5)\n",
"\n",
"(a @ mt @ b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interestingly, the `@` operator is consistently faster than either of the other options.\n",
"\n",
"Presumably, this all depends a lot on the matrix libraries numpy is using. Here's what I have:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2017-11-16T19:06:13.287358Z",
"start_time": "2017-11-16T19:06:13.280942Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"blas_mkl_info:\n",
" libraries = ['mkl_rt', 'pthread']\n",
" library_dirs = ['/Users/boyle/.continuum/anaconda3/lib']\n",
" define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]\n",
" include_dirs = ['/Users/boyle/.continuum/anaconda3/include']\n",
"blas_opt_info:\n",
" libraries = ['mkl_rt', 'pthread']\n",
" library_dirs = ['/Users/boyle/.continuum/anaconda3/lib']\n",
" define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]\n",
" include_dirs = ['/Users/boyle/.continuum/anaconda3/include']\n",
"lapack_mkl_info:\n",
" libraries = ['mkl_rt', 'pthread']\n",
" library_dirs = ['/Users/boyle/.continuum/anaconda3/lib']\n",
" define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]\n",
" include_dirs = ['/Users/boyle/.continuum/anaconda3/include']\n",
"lapack_opt_info:\n",
" libraries = ['mkl_rt', 'pthread']\n",
" library_dirs = ['/Users/boyle/.continuum/anaconda3/lib']\n",
" define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]\n",
" include_dirs = ['/Users/boyle/.continuum/anaconda3/include']\n"
]
}
],
"source": [
"np.show_config()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So I'm using MKL."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@hugohadfield
Copy link

This is really interesting, the matrices are ludicrously large and very sparse. I had a quick look at scipys sparse matrix library to see if there was an easy two or three line fix but unfortunately they do not like dealing with 3d arrays!

@chrisjldoran
Copy link

Matrices are nearly always going to be the slowest way to implement the geometric product - they are a worse-case scenario. The only reason you might want to consider them is if there is some serious hardware optimisation you can take advantage of (such as 4x4 matrix multiplication on a GPU). There is also some overhead in converting to and from matrices and multivectors. That involves a bunch of further multiplies and traces. I would always go for a sparse representation in terms of explicit multivector components, and then figure out how to optimise the (highly parallel ) multivector product.

@moble
Copy link
Author

moble commented Nov 17, 2017

@chrisjldoran There's another reason to consider matrices: Someone else has already done the coding! :) But yes, the goal is certainly to get to a better implementation of the products.

@hugohadfield Have you tried these timings on your installation? In particular, I'm wondering if the @ operator really is faster for many configurations, or if it's something peculiar to mine.

Copy link

ghost commented Nov 20, 2017

@moble Hi guys, I'm also really keen to see better performance in the clifford library. Just chiming in to say I executed the notebook and and the @ is fastest for me too (dot 56 µs ± 1.29 µs, einsum 64.4 µs ± 804 ns, @ 43.1 µs ± 457 ns).
Output of show_config():

blas_mkl_info: NOT AVAILABLE blis_info: NOT AVAILABLE openblas_info: libraries = ['libopenblas_v0.2.20_mingwpy', 'libopenblas_v0.2.20_mingwpy'] library_dirs = ['c:\\opt\\64\\lib'] language = c define_macros = [('HAVE_CBLAS', None)] blas_opt_info: libraries = ['libopenblas_v0.2.20_mingwpy', 'libopenblas_v0.2.20_mingwpy'] library_dirs = ['c:\\opt\\64\\lib'] language = c define_macros = [('HAVE_CBLAS', None)] lapack_mkl_info: NOT AVAILABLE openblas_lapack_info: libraries = ['libopenblas_v0.2.20_mingwpy', 'libopenblas_v0.2.20_mingwpy'] library_dirs = ['c:\\opt\\64\\lib'] language = c define_macros = [('HAVE_CBLAS', None)] lapack_opt_info: libraries = ['libopenblas_v0.2.20_mingwpy', 'libopenblas_v0.2.20_mingwpy'] library_dirs = ['c:\\opt\\64\\lib'] language = c define_macros = [('HAVE_CBLAS', None)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment