Skip to content

Instantly share code, notes, and snippets.

@sklam
Created January 31, 2018 16:57
Show Gist options
  • Save sklam/9fe5431672441e6a689e3ec2a121d104 to your computer and use it in GitHub Desktop.
Save sklam/9fe5431672441e6a689e3ec2a121d104 to your computer and use it in GitHub Desktop.
Plan to rewrite gufunc in numba

(G)Ufunc Rewrite Plan

Introduction

This document describes the reasons for rewriting the gufunc support in numba, and it sets the goals for the rewrite.

Background

Numba supports the creation of numpy ufunc and gufunc using the @vectorize and @guvectorize decorators, respectively. These decorators provided an easy way to create ufuncs and gufuncs without sacrificing execution performance (See Appendix 1). To use these decorators, users provide a the kernel function. For ufuncs, the kernel takes scalar arguments only. For gufuncs, the kernel takes Nd arrays.

Issues with the current implementation

This section describes the current problems and limitations on the current implementation of the (g)ufunc support.

Issues due to legacy code in Numba

The @vectorize and @guvectorize for ufunc and gufunc respectively were available since a very early versions when numba's feature set was small. As numba feature set grows, the limitation of the current implementation becomes more obvious.

  • Ufunc and Gufunc should be the same object. Numba's gufunc creation pipeline is distinct from the ufunc creation pipeline. However, ufuncs should be considered as a special-case of gufunc where all parameters are of scalar type. Combining the pipeline will reduce code duplication.
  • Gufunc lack of dynamic type inference. Numba's gufunc support still requires type declaration but the ufunc support can perform dynamic type inference.
  • Numba @jit'ed function can't consume gufunc. Numba @jit'ed function can call ufuncs by directly emitting the broadcasting and looping logic into the callsite. Gufuncs are more complicated and we haven't implemented a way to use a gufunc directly in the compiled code. (Note: parfor has hacks to use gufunc kernel directly)

Issues due to limitations of NumPy (g)ufunc

  • Numpy (g)ufunc is limited to arrays parameters The new parallel accelerator features leverages parallel gufuncs as the building block of parallel loops. The new features will fail if the loops reference non-array-convertiable types. (Note: non-array-convertiable type means any type that is not array or not safely convertible to an array)
  • No array contiguous information When the numpy (g)ufunc machinery invokes the kernel function, there is no guarantee to the contiguous-ness (C-order? Fortran-order?) of the array parameters. This affects potential performance gain with automatic SIMD-vectorization.

Goals of the new gufunc system:

This section lists the features of the new gufunc system.

  1. A single type gufunc where ufunc is just a special-case.
  2. A gufunc implementation that is indepedent of numpy and python.
  3. Gufunc to be first-class numba type.
    • First-class gufuncs so they can passed as parameters and be used in return value.
    • Currently, all callable types in numba are not first-class.
  4. Supports any numba data types
    • Element type (dtypes): tuples, user defined structures
    • Container types: any sequence types, ragged array...
  5. Ahead-of-time (AoT) compilation. (Depends on [2]) Precompiling gufunc into native shared library can avoid compilation overhead at deployment.
  6. Usable with or without numba runtime (NRT).
    • The NRT provides support like reference-counting
    • This wil depend on features used in the gufunc kernel.
  7. Callable from C code. (Depends on [2]; related to [5, 6])
  8. Guarantee of array contiguous-ness in kernel. To enable aggressive optimization.
  9. Hardware Heterogeneous
    • Heterogeneous in the perspective of the user of a gufunc. The gufunc will abstract away how the kernel is dispatched on different hardware.
    • CPU, Multicore, GPU targets
    • Gufunc of certain HW target can be inlined to callsite.
    • Does not require auto HW target selection.
  10. Feature extensions
    • Passthru/non-broadcasting argument.
    • Flexible type signature; i.e.
      • float32, int32, Any, Any -> int32, float32
        • 2nd, 3rd arguments can be any type
      • T, T, Any, Any -> T, T
        • 2nd, 3rd arguments can be any type
        • T is like C++ template parameter
  11. Dynamic type inference when type declaration is not provided.
    • Like how ufunc (dufunc) work currently.

Milestones

This section describes the major milestones in the gufunc rewrite work.

The following are milestones will go before the numba version 1.0 release.

  1. Replacement for current numpy gufunc system

    • CPU replacement
    • Non-CPU replacement (mostly there in numba as purely python code)
    • Unified (g)ufunc implementation
    • Dynamic type inference
    • array contiguous-ness guarantee
      • Needed for parallel accelerator features; better SIMD-vectorization.
  2. Feature extension Lvl 1

    • Passthru/non-broadcasting arg

    • Potential API

      @guf('(m, n), (n), *, * -> (n), (m)')
      def foo(matrix, vector, extra1, extra2, output1, output2):
            # extra1 and extra2 are not broadcasted
            pass
      
      # explicit output
      foo(matrix, vector, extra1, extra2, out=(output1, output2))
      # implicit output
      (output1, output2) = foo(matrix, vector, extra1, extra2)
    • Prioritize due to frequent request from users.

  3. Generalized for more common numba types as dtypes.

    • Prioritize due to need in parallel accelerator.

The following milestones are scheduled for post v1.0. They not ordered.

  1. Feature extension Lvl 2

    • Flexible type signature
      • Consider datashape type pattern to aid inference of output type?
        @guf('(m, n), (n), *, * -> (n), (m)',
             ['float32, int32, Any, Any -> int32, float32',
              'T, T, Any, Any -> T, T',])
      • overload resolution by first compatible type-signature.
  2. CPU AoT compilation

    • Compile to shared lib
    • distutil/setuptools helper (like CFFI)
  3. Generalized for any numba/ndtypes dtypes

    • Numba types to produce ndtypes spec
  4. Generalized for any sequence container

  5. General heterogeneous gufunc object (?)

    • make XND address-space aware
    • example: cpu-gufunc calling gpu-gufunc.
    • non-example: gpu-gufunc cannot call cpu-gufunc.
  6. GPU AoT compilation (?)

    • Potential usecase in exporting code for use in machine-learning/deep-learning application.

Other notes

  • We may consider perf enhancements, like inlining gufunc and compiling explicit loopnest, for post v1.0. But we want a generic implementation first to define a stable API for v1.0.
  • Overload resolution: Allow custom resolution function in the underlying impl. Different user facing API may provide different resolution logic. This could be useful to reuse the new gufunc system to unify all numba function dispatch system.

Appendix

1. More Background on NumPy (g)ufunc

NumPy univesral functions(ufuncs) are powerful functional machinery for applying computation over arrays of compatible shapes. It is the core of many NumPy builtins functions. Each ufunc has a kernel function that is an elementwise function. The kernel function can take scalars, vectors or ND array slices as an elements. Ufuncs that can take ND array slices are called generalized ufuncs (gufunc_) and they define a shape signature for the array dimensions accepted by the kernel function. Another way to think about it, basic ufuncs are just a special-case of gufunc that the kernel function takes 0D array slices (thus scalars). For the rest of this document, we will simply refer to them as ufuncs.

The extension API for creating a user-defined ufunc is limited in the Python level. The numpy.vectorize makes ufuncs out of python functions but it is limited in performance. For the most efficient implementation, one must write it using the C-API.

Numba provides the numba.vectorize and the numba.guvectorize decorators to simplify the creation of user defined ufuncs and generalized ufuncs, respectively. These decorators will also compile the kernel function; thus, it provides execution performance comparable to a custom ufunc written in C.

2. Ndtypes and XND

See http://ndtypes.readthedocs.io/en/latest/ and http://xnd.readthedocs.io/en/latest/.

@datnamer
Copy link

datnamer commented Feb 1, 2018

How will the annotations and overload resolution interact with the planned improvements to the type system?

Have you considered using pep-484 type annotations? I'm especially interested in type constraints etc

@sklam
Copy link
Author

sklam commented Feb 1, 2018

The proposal will focus on the runtime implementation then on the typing system. The initial plan will try to create a replacement of the current system so we no longer depends on numpy.

For PEP-484, the gufunc feature needs to go beyound python support. To be C callable and and AoT compile-able, the type signature needs to be language agnostic. This means we can't use PEP-484 directly. But, we can always allow a translation of PEP-484 type annotation into the our types. At the same time, there are still ongoing effort in improving PEP-484 type annotation for numpy.

Type constraints are interesting. To be specific, are you referring to things like AnyStr = TypeVar('AnyStr', str, bytes) in https://www.python.org/dev/peps/pep-0484/#generics. There is a related concept in ndtypes called http://ndtypes.readthedocs.io/en/latest/ndtypes/types.html#type-kinds. (ping @skrah)

@datnamer
Copy link

datnamer commented Feb 2, 2018

Yes, those. I think they will be important for the more complex numba type system, because the current way of defining types/structs and interfaces is quite limited.

There are many many different type and objects and array systems for python numerical computing (tensorflow, pytorch, numpy, numba, blaze etc etc) Now datashape, and the normal python ecosystem is congregating around mypy . At the same time, there is the hard barrier between compile and python code and various runtime multiple dispatch libraries.

Why not solve all that by standardizing on pep 484 types and use generalized Gufuncs as a multiple dispatch implementation both under the hood for numba (so we can build up user defined objects through a function lattice) and use pep - 484 typs with some compile time type checking with mypy to break down the barrier. Also you can use pep 544 protocols with associated gugufuncs to build up standard abstract for all these packages?

There was already talk from guido of a potential multiple dispatch resolution for runtime behavior of different function signatures. Maybe this can be incorporated into cpython, or at least be available to python.

Type kinds can map naturally to these protocols. There was some talk of a standard array protocol in the numba typing discussion. It would be a great if we can avoid another standard fragmentation.

Edit: I just realized you might have meant that the Gufunc system must be usable from outside of python, not just callable. That's a different constraints and not sure then where pep-484 will fit.

@albop
Copy link

albop commented Mar 6, 2018

Just bumped into that one. There is one issue I have the current guvectorize which, I understand, comes from numpy definition of gufuncs. The current definition of the core dimensions have some limitations

  1. constant are not allowed: (a,2),(b,2)->(a,b) for instance
  2. the dimensions of output must also appear in the input dimensions. (a,b),(a,b)->(c,a,b). (this is for me the most annoying case)
  3. one cannot do any operation on the dimensions: (a,b),(a,b)->(a+b,a,b) (kind of extends the preceding case).
    I don't know how these limitations interfere with the refactoring, but I thought they were worth mentioning.

(ref to the ml: https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/UWpmVdpFQbM)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment