Skip to content

Instantly share code, notes, and snippets.

@alex
Created January 27, 2012 17:53
Show Gist options
  • Save alex/1690005 to your computer and use it in GitHub Desktop.
Save alex/1690005 to your computer and use it in GitHub Desktop.

Increased Compatibility for NumPyPy

[This is a follow up on the work I did last semester. Some of the descriptive text is the same.]

PyPy is an implementation of Python, written in (a restricted subset of) Python, featuring an advanced tracing just-in-time compiler. It is currently the fastest implementation of Python available.

NumPy is a framework for vector and matrix numerical calculations in Python, it is written largely in C, utilizing the CPython extension API, and thus does not run on PyPy.

Reimplementing NumPy for PyPy is considered advantageous because one of NumPy's chief goals is performance, and PyPy is a) the demonstrated performance king for Python VMs, and b) has the potential to further speed up NumPy. NumPy is widely used by scientists in all fields, and the ability to bring greater speed to their applications, without the need to resort to languages such as C/C++/Java, is considered both a technical and social gain.

Since last semester NumPyPy has made increadible strides in compatibility. We now support multi-dimensional arrayswith a wide variety of data types. Therefore, my goal for this semester are to increase compatibility by adding more of the methods for interacting with data that NumPy has. At the time of writing, NumPyPy implements 66/161 methods on arrays, 22/48 methods on datatypes, and 121/559 module level methods.

This projects will address these missing methods in three chunks:

  • First, exposing ndarray.ctypes, this is exposes low-level information about the memory location of an array's data, and can be used, in conjuction with f2pypy, to interface directly with existing fortran and C libraries, such as ATLAS, BLAS, and LAPACK.
  • Second, implementing methods that are views on the memory of existing data, such as ravel(), which exposes a single dimensional view of a multi-dimensional array. This step includes allowing arrays to be created in Fortran iteration order, column-major, (as opposed to the default, C order, row-major).
  • The final goal will be bringing in more of NumPy's pure-Python code. Though NumPy is largely written in C, there are portions of it written in Python. This step will involve analysing the dependencies of different pure-Python code, implementing those, and then bringing in the pure-Python code. For example, NumPy includes a pure-Python Matrix class, which relies on being able to subclass ndarray, which NumPyPy doesn't yet support.

These would be implemented on the following timeline:

  • Weeks 1-3: Exposing ndarray.ctypes and exposing any fortran libraries that NumPy does.
  • Weeks 4-6: Implementing fortran/column-major storage for arrays.
  • Weeks 7-8: Adding in new data-manipulation methods.
  • Weeks 9-10: Adding support for subclassing ndarray.
  • Weeks 11-14: Bringing in addition NumPy Python code.

All milestones include tests and documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment