Skip to content

Instantly share code, notes, and snippets.

@GaelVaroquaux
Last active September 15, 2023 03:58
Show Gist options
  • Save GaelVaroquaux/1249305 to your computer and use it in GitHub Desktop.
Save GaelVaroquaux/1249305 to your computer and use it in GitHub Desktop.
Copy-less bindings of C-generated arrays with Cython

Cython example of exposing C-computed arrays in Python without data copies

The goal of this example is to show how an existing C codebase for numerical computing (here c_code.c) can be wrapped in Cython to be exposed in Python.

The meat of the example is that the data is allocated in C, but exposed in Python without a copy using the PyArray_SimpleNewFromData numpy function in the Cython file cython_wrapper.pyx.

The purpose of the ArrayWrapper object, is to be garbage-collected by Python when the ndarray Python object disappear. The memory is then freed. Note that there is no control of when Python will deallocate the memory. If the memory is still being used by the C code, please refer to the following blog post by Travis Oliphant:

http://blog.enthought.com/python/numpy-arrays-with-pre-allocated-memory

You will need Cython, numpy, and a C compiler.

To build the C extension in-place run:

$ python setup.py build_ext -i

To test the C-Python bindings, run the test.py file.

Files  
c_code.c The C code to bind. Knows nothing about Python
cython_wrapper.c The Cython code implementing the binding
setup.py The configure/make/install script
test.py Python code using the C extension

Author:Gael Varoquaux
License:BSD 3 clause
/* Small C file creating an array to demo C -> Python data passing
*
* Author: Gael Varoquaux
* License: BSD 3 clause
*/
#include <stdlib.h>
float *compute(int size)
{
int* array;
array = malloc(sizeof(int)*size);
int i;
for (i=0; i<size; i++)
{
array[i] = i;
}
return array;
}
""" Small Cython file to demonstrate the use of PyArray_SimpleNewFromData
in Cython to create an array from already allocated memory.
Cython enables mixing C-level calls and Python-level calls in the same
file with a Python-like syntax and easy type cohersion. See
http://cython.org for more information
"""
# Author: Gael Varoquaux
# License: BSD 3 clause
# Declare the prototype of the C function we are interested in calling
cdef extern from "c_code.c":
float *compute(int size)
from libc.stdlib cimport free
from cpython cimport PyObject, Py_INCREF
# Import the Python-level symbols of numpy
import numpy as np
# Import the C-level symbols of numpy
cimport numpy as np
# Numpy must be initialized. When using numpy from C or Cython you must
# _always_ do that, or you will have segfaults
np.import_array()
# We need to build an array-wrapper class to deallocate our array when
# the Python object is deleted.
cdef class ArrayWrapper:
cdef void* data_ptr
cdef int size
cdef set_data(self, int size, void* data_ptr):
""" Set the data of the array
This cannot be done in the constructor as it must recieve C-level
arguments.
Parameters:
-----------
size: int
Length of the array.
data_ptr: void*
Pointer to the data
"""
self.data_ptr = data_ptr
self.size = size
def __array__(self):
""" Here we use the __array__ method, that is called when numpy
tries to get an array from the object."""
cdef np.npy_intp shape[1]
shape[0] = <np.npy_intp> self.size
# Create a 1D array, of length 'size'
ndarray = np.PyArray_SimpleNewFromData(1, shape,
np.NPY_INT, self.data_ptr)
return ndarray
def __dealloc__(self):
""" Frees the array. This is called by Python when all the
references to the object are gone. """
free(<void*>self.data_ptr)
def py_compute(int size):
""" Python binding of the 'compute' function in 'c_code.c' that does
not copy the data allocated in C.
"""
cdef float *array
cdef np.ndarray ndarray
# Call the C function
array = compute(size)
array_wrapper = ArrayWrapper()
array_wrapper.set_data(size, <void*> array)
ndarray = np.array(array_wrapper, copy=False)
# Assign our object to the 'base' of the ndarray object
ndarray.base = <PyObject*> array_wrapper
# Increment the reference count, as the above assignement was done in
# C, and Python does not know that there is this additional reference
Py_INCREF(array_wrapper)
return ndarray
""" Example of building a module with a Cython file. See the distutils
and numpy distutils documentations for more info:
http://docs.scipy.org/doc/numpy/reference/distutils.html
"""
# Author: Gael Varoquaux
# License: BSD 3 clause
import numpy
from Cython.Distutils import build_ext
def configuration(parent_package='', top_path=None):
""" Function used to build our configuration.
"""
from numpy.distutils.misc_util import Configuration
# The configuration object that hold information on all the files
# to be built.
config = Configuration('', parent_package, top_path)
config.add_extension('cython_wrapper',
sources=['cython_wrapper.pyx'],
# libraries=['m'],
depends=['c_code.c'],
include_dirs=[numpy.get_include()])
return config
if __name__ == '__main__':
# Retrieve the parameters of our local configuration
params = configuration(top_path='').todict()
# Override the C-extension building so that it knows about '.pyx'
# Cython files
params['cmdclass'] = dict(build_ext=build_ext)
# Call the actual building/packaging function (see distutils docs)
from numpy.distutils.core import setup
setup(**params)
""" Script to smoke-test our Cython wrappers
"""
# Author: Gael Varoquaux
# License: BSD 3 clause
import numpy as np
import cython_wrapper
a = cython_wrapper.py_compute(10)
print 'The array created is %s' % a
print 'It carries a reference to our deallocator: %s ' % a.base
np.testing.assert_allclose(a, np.arange(10))
@AlDanial
Copy link

This is a terrific example; perfect starting point for a C/Python integration I have in mind. The code builds and runs cleanly on Fedora 14 and RHEL 4.x and 5.x machines I have access to. However a colleague with a 64 bit Windows 7 machine gets a build failure:

The error message:
C:\Users\steve\Desktop\gist1249305-8c4e86c6c0d86e796b27ee5a939fdd7b378d5058\gist1249305-8c4e86c6c0d86e796b27ee5a939
fdd7b378d5058> python setup.py build_ext --inplace
running build_ext
No module named msvccompiler in numpy.distutils; trying from distutils
cythoning .\cython_wrapper.pyx to .\cython_wrapper.c
building 'cython_wrapper' extension
creating build
creating build\temp.win32-2.7
creating build\temp.win32-2.7\Release
C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\BIN\cl.exe /c
/nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\Python27
\lib\site-packages\numpy\core\include -IC:\Python27\include
-IC:\Python27\PC /Tc.\cython_wrapper.c /Fobuild\temp.wi
n32-2.7\Release.\cython_wrapper.obj
Found executable C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\BIN\cl.exe
cython_wrapper.c
c:\users\steve\desktop\gist1249305-8c4e86c6c0d86e796b27ee5a939fdd7b378d5058\gist1249305-8c4e86c6c0d86e796b27ee5a939
fdd7b378d5058\c_code.c(13) : error C2143: syntax error : missing ';'
before 'type'

He can build and run other cython projects so the error above is a surprise (given the success on Linux). Do you have access to a Windows 7 box to test this code on? Or should we take this up on the cython mailing list?

@AlDanial
Copy link

One more minor comment: the build command cited above,

$ python setup.py build_ext --i

does not work on any of my machines ("error: option --i not a unique prefix"); I had to use --inplace instead of just --i.

@GaelVaroquaux
Copy link
Author

GaelVaroquaux commented Feb 22, 2012 via email

@GaelVaroquaux
Copy link
Author

GaelVaroquaux commented Feb 22, 2012 via email

@Qwlouse
Copy link

Qwlouse commented Feb 8, 2013

Hey,
first of all thank you a lot for this great example. It really helped me out. I am a little confused, though so let me ask you this:

Why do you have two calls to create an ndarray (one in __array__ and one in py_compute)?
Couldn't you just add a as_ndarray() method to the ArrayWrapper and return the final numpy array like this:

 def as_ndarray(self):
    cdef np.npy_intp shape[1]
    cdef np.ndarray ndarray
    shape[0] = <np.npy_intp> self.size
    ndarray = np.PyArray_SimpleNewFromData(1, shape, np.NPY_INT, self.data_ptr)
    ndarray.base = <PyObject*> self
    Py_INCREF(self)
    return ndarray

and then just return arraywrapper.as_ndarray() in py_compute.

This should work the same way, or am I missing something?

@aldanor
Copy link

aldanor commented Jul 15, 2014

Wonder if this would work with structs / record arrays?

@syrte
Copy link

syrte commented Oct 23, 2014

How about this post:
http://stackoverflow.com/questions/23872946/force-numpy-ndarray-to-take-ownership-of-its-memory-in-cython/
would that be safe to let numpy free the memory?

@dashesy
Copy link

dashesy commented Apr 19, 2016

@syrte did you figure out if it is safe to rely on PyArray_ENABLEFLAGS? Also, do you know how to get the type_num of a dtype object to use in PyArray_SimpleNewFromData?

@fredRos
Copy link

fredRos commented Jul 25, 2017

Thanks for this useful complete example. When I copied over portions into my code, I ran into the issue that the assignment

ndarray.base = <PyObject*> array_wrapper

didn't work. It's because I forgot to declare the ndarray as a cython variable, which is done properly in the gist.

cdef np.ndarray ndarray

Independent of that code-copying issue, I think it is easiest to just set PyArray_ENABLEFLAGS(arr, np.NPY_OWNDATA) to free the memory as @syrte asked but this example demonstrates how to implement a custom delete function. I need to call fftw_free instead of free and here is the only way I found that achieves this. 💯

They suggestion by @Qwlouse avoids code duplication when the ArrayWrapper is needed multiple times and works just as fine. In my application, the array is 2D and I have

cdef class ArrayWrapper:
    """Wrap an array allocated in C that has to be deleted by `galario_free`"""
    cdef void* data_ptr
    cdef int nx, ny

    cdef set_data(self, int nx, int ny, void* data_ptr):
        """ Set the data of the array
        This cannot be done in the constructor as it must receive C-level
        arguments.

        Parameters:
        -----------
        nx: int
            Number of image rows
        data_ptr: void*
            Pointer to the data
        """
        self.data_ptr = data_ptr
        self.nx = nx
        self.ny = ny

    cdef as_ndarray(self, int nx, int ny, void* data_ptr):
        """Create an `ndarray` that doesn't own the memory, we do."""
        cdef np.npy_intp shape[2]
        cdef np.ndarray ndarray

        self.set_data(nx, ny, data_ptr)

        shape[:] = (self.nx, int(self.ny/2)+1)

        # Create a 2D array, of length `nx*ny/2+1`
        ndarray = np.PyArray_SimpleNewFromData(2, shape, complex_typenum, self.data_ptr)
        ndarray.base = <PyObject*> self

        # without this, data would be cleaned up right away
        Py_INCREF(self)
        return ndarray

    def __dealloc__(self):
        """ Frees the array. This is called by Python when all the
        references to the object are gone. """
        print("Deallocating array")
        my_custom_free(self.data_ptr)

...
def fftshift(double[:,::1] data):
    nx, ny = data.shape[0], data.shape[1]
    # operate on (nx, ny) array and return a (nx, ny/2+1) array that we have to deallocate
    cdef void* res = my_C_function(nx, ny, <void*>&data[0,0])

    return ArrayWrapper().as_ndarray(nx, ny, res)

@hankliu5
Copy link

hankliu5 commented Jul 1, 2019

Appreciate for the example. This is extremely useful for me to understand how to bind Cython with Numpy.

@danieldanciu
Copy link

danieldanciu commented Oct 5, 2021

return np.array(array_wrapper)

Should probably be

return np.array(array_wrapper, copy=False)

otherwise a copy of the array will be made anyway, right?
(I am assuming that the reference count for array_wrapper will be correctly updated so that it's not garbage collected)

@SeanDS
Copy link

SeanDS commented Feb 25, 2022

The original blog post is no longer available, but I found it on the archive: http://web.archive.org/web/20160321001549/http://blog.enthought.com/python/numpy-arrays-with-pre-allocated-memory/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment