Skip to content

Instantly share code, notes, and snippets.

@esc
Created February 22, 2022 17:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save esc/b1c7328004bfe731d096fa3cae36afc0 to your computer and use it in GitHub Desktop.
Save esc/b1c7328004bfe731d096fa3cae36afc0 to your computer and use it in GitHub Desktop.
Numba Vision Stuart
What should Numba be?
* Numba, as in the code base, should really be an extensible toolkit for writing compilers, the default user facing front end of which is _an_ implementation of a compiler for Python and (maybe!) NumPy.
* It needs to be sufficiently extensible to cover the spectrum of use cases, including but not limited to:
* Typical users, just wanting to @jit some numerical functions (many many users, the most common use case)
* Those providing libraries for domain specific use (e.g. researchers - TARDIS-SN)
* Those providing libraries for use in scientific computing as part of the numerical python scaffold (e.g. pydata-sparse)
* Those writing more advanced libraries containing their own data types etc (e.g. AwkwardArray)
* Compiler extenders wanting to write and explore new compiler use cases/needing a custom compiler (e.g. Bodo, omnisci-db)
* Hardware vendors wanting to extend Numba support to their custom silicon (e.g. NVIDIA, Intel)
* There is no reason why this cannot be achieved, but it will mean that those consuming Numba’s APIs will need to accept that they have to do some custom work to get what they want. Making the compiler extensible means that any given group could override the part that does not suit them, it is key that everyone understands that flexibility is more important than any one user or group’s requirements. After all, what is best suited for one use case may not be for another.
* What’s realistically on Numba’s roadmap:
* Dealing with ever changing bytecode operations and structure in Python. Ideally this would be addressed by a PEP to make suggestions about and implement stabilising/canonical forms/side table information etc.
* Dealing with new NumPy
* Dealing with new LLVM
* Keeping releases rolling, fixing bugs, responding to users/requests, supporting new hardware.
* There’s around 7 million downloads a month across all distribution mechanisms, ensuring that Numba is entirely reliable and consistent is an absolute must.
* Where would I like to see Numba go…
* So much of the Numba code base is potentially reusable by any silicon or synthetic target, for example, adding two int32s is pretty much the same everywhere. The target extension API work needs finishing to make this possible and reduce the amount of duplicated code/maintenance burden. It will also make it very easy to add new targets, which massively increase the flexibility of the compiler.
* Numba needs to be faster at compiling, this is likely achieved through concurrent and lazy compilation. This requires changing lowering to make it “mechanical” as a first item of work. Second item is to delay all lowering until typing is/compilation pipelines are complete. After which there are a few routes to achieve a massive performance increase, involving one or both of python concurrency and ORCJIT.
* Numba needs to be able to talk to other languages more easily, C++ especially. It also needs to be able to integrate better with external libraries and packages.
* AOT compilation story needs to be thought through and better implemented.
* As above but for caching and distributing caches
* Structured exceptions need adding such that it’s easy to increase/decrease error message output (this is something that has to be extensible, everyone wants a different thing).
* Need to consider that Numba the project is different to Numba the code base.
* The project itself needs long term commitment and funding from industry:
* The time it takes to get someone up to speed with the code base is large, particularly to the point where they can exercise judgement about the impact of a proposed change. Casual contributors are unlikely to be in a position to be able to assess most involved changes, this is similar to LLVM.
* The maintenance burden is high and get’s higher:
* Python/NumPy/LLVM impact this
* New features impact this
* Extending/increasing flexibility impacts this
* A number of engineers are heavily personally invested in the project and Numba would struggle survive without them, same can be said for Anaconda and the infrastructure resources it supplies.
* Whilst the Numba code base as a “toolkit” looks to solve a lot of use cases for various parties, there’s additional support structure needed:
* Profilers: both line and function, via statistical and augmented source approaches.
* Coverage tools
* Debuggers for use on user defined source: cross language and cross hardware
* Compiler debugger/reversing tool integration
* Compiler “explorers”, various IRs, asm, vectorisation output etc.
* Integration with jupyter-lab
* Other core packages
* numba-scipy (or should effort go into SciPy adopting Numba/knowing about it, or both!)
* numba-extras (incubation area for new features)
* Non-physical aspects of the project also need considering:
* The Numba community is built around some often observed OSS values (cooperation, collaboration, fairness, elements of meritocracy etc).
* It is important to preserve these values as it helps maintain and bind the community that built and continues to build Numba.
* In summary, there are many places Numba can go in the future, but the key is for it to be supported well and to focus on flexibility/extensibility to let other’s build what they want. Numba is already part of the HPC/scientific computing “furniture” and with some effort it can become the same for implementing compilers.
@jpivarski
Copy link

Numba needs to be able to talk to other languages more easily, C++ especially.

If integration with Cling is plausible, then that could be a way to do it. https://compiler-research.org/ (my colleagues) are porting Cling out of ROOT as a stand-alone project, partly an LLVM sub-project. If the LLVM that Numba and Cling generate are remotely compatible, that could be an opportunity.

Cling is closely tied to the Cppyy project, which defines a semantic map between C++ concepts and Python concepts, though it implements them in fully dynamic Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment