Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Work Product Submission for GSoC 2021

Integrating NumPy into Syft - GSoC 2021

As the ten weeks of GSoC come to an end, I'd like to summarize my project work as well as some final thoughts of the experience.

PySyft and NumPy

PySyft is a library that enables decoupling of private data and model training, and thus, allows us to compute over data we do not own or see. NumPy is the leading library for scientific computing with Python. It is an open-source library that allows for powerful N-dimensional arrays, numerical computing tools, and high performance.

The Project

The project aimed to integrate NumPy into Syft, allowing efficient mathematical computations to be brought to the field of Privacy-Preserving Machine Learning (PPML). While Syft had support for essential but limited NumPy functionality, the code was embedded into Syft itself. Any intended modifications had to be made painstakingly -- often requiring a lot of manual testing.

The project goals were shifted slightly to help build an automated crawler and package builder (christened RedBack) to provide external library support for Syft. This would add support for numerous libraries which could be used with Syft, requiring minimal manual effort.

The baseline for project completion was set to build a NumPy support package with RedBack features and move the integrated library out of Syft, with the same level of functionality as currently provided.

Crawling and Building Support Packages

The first part of the project was to make a crawler for external libraries. This crawler uses the module tree of the library to create a JSON file with all submodules, classes and methods of the package and their return types.

The next part of the project was to use this JSON file to create a Syft compatible AST and make a suitable package for the same. PyScaffold was used for this purpose and the result was the ability to create syft-libs installable via pip.

Due to its large scope, project RedBack was divided among three people. Since other GSoC participants handled these sections, more information can be found here: https://github.com/OpenMined/PySyft/tree/feature/redback/packages/syft-libs .

Automatic Library Loader

As part of project RedBack, I undertook the task of creating an auto import mechanism for Syft. The purpose of the autoloader was to import supported syft-libs automatically when the relevant external package was imported.

While the task seems trivial at first glance, it turned out to be a challenge. This was because the Python import statement as well as the module cache (sys.modules) call the underlying C code (for example, PyDict_SetItem, PyDict_GetItem) to make any modifications to the imported modules. This caused any attempts to successfully manipulate the sys.modules dictionary to affect imported modules to be futile.

The workaround turned out to be monkey-patching the import mechanism itself. When importing, if the package being imported is a supported library, its relevant syft-lib and those for other packages it requires (if found) would be added to a queue. Once the initial imports are finished, the items in the queue would be imported in order. The queue processing required a wrapt post-import hook.

syft-numpy

Once project RedBack had basic functionality, it was time to make support packages and move the libraries out of Syft. After creating the basic package, modifying the JSON and adding relevant protos, serde and tests, I ran into another issue -- the package was trying to reference NumPy code that was only required if building it from source. A number of errors popped up when trying to use an already installed instance of the library.

This needed to be fixed to have a working support lib. This issue can largely be attributed to the code generation techniques used in the NumPy source code and C API that often require several compilers.

Other support packages created and installed using the same approach worked quite well.

References to Project Work

Here are the links to the work done as part of this project:

Future Scope

Here are a few things that can be improved upon in RedBack and syft-numpy.

  1. Functionality: Finding a workaround to the issues in syft-numpy currently present. This is essential to restoring functionality and providing the base for further NumPy support.
  2. Performance: Performance testing and benchmarking is an important part of supporting any package and syft-numpy is no exception. I hope to test the performance of syft-numpy and compare it to when the library was a part of the Syft code itself.
  3. RedBack improvements: There are certain features planned for further RedBack work along with improving some of the current code.

Final Thoughts

GSoC has been a very fulfilling experience for me. From finding a completely new field of interest to learning so much about things I barely paid any attention to before, it was an extraordinary and illuminating journey. The OpenMined community has been a source of support and wonder from my perspective. It provided me with the rare opportunity of interacting with brilliant people who bring virtuosity and innovation in all they do. Being able to help build software that is used by many people across numerous fields has provided a sense of accomplishment.

In general, GSoC is an excellent gateway into the world of open source, even for those already familiar with it. It allows one to not only level up tech skills, but also teaches the importance of timely, open communication and constructive discussion and collaboration in the community. It's an invaluable program no one should miss out on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment