Name | Arpitsinh Vaghela |
---|---|
Project | Integrating Pandas into Syft |
Organisation | OpenMined |
Mentor | Mahava Jay, Tudor Cebere |
Integrating a library into syft requires:
- Protobufs for types; that are communicated between the nodes.
- Wrapper class for these types with a method to serialize (
object2proto
) and deserialize (proto2object
). - List of modules, classes, functions/methods to support.
The above mention process is tedious and time-consuming, parts of it were automated and the following were the goals of the project,
- Allow definition outside of syft codebase wherever possible
- Provide tooling which generates defaults automatically
- Allow for opt-in importing of a target library
- Package support should be able to be defined as a JSON like a configuration file
- A separate Deny list should be built which contains all known library/methods which are potentially insecure and prevents their use by default
This would make adding support for any library to syft easier.
-
Move statsmodels support out of Syft core
- Move library support for
statsmodels
fromsyft.lib
into its own new packagepackages/syft-libs/syft-statsmodels
. - Add library support into syft AST from a config JSON file.
- PyScaffold Extension to generate library support packages
syft-XYZ
with a custom directory structure.
- Move library support for
-
- Allow syft to internally deny methods and classes that may give rise to security issues.
-
CI to test external lib support packages and Meta Package
- CI to test library support packages.
- Meta package that on installation installs all support packages in
packages/syft-libs
.
$ pip install syft-lib # installs syft-pandas, syft-xgboost ...
-
Union and Primitive Type Support
To support a method/function, syft ast requires
(method_path,return_type)
tuple.- If a method returns a python primitive the path to
return_type
is expected to besyft.lib.python.Dict
rather thandict
, this conversion was automated. - If the return type is a Union then it is expected to be instance of
UnionGenerator
, i.e,Union[int, float] => UnionGenerator[syft.lib.python.Int, syft.lib.python.Float]
, this conversion was automated.
- If a method returns a python primitive the path to
-
Generate exploration Notebooks within the package
To automate the process of generating lib ast, the paths to classes, modules, and methods with its
return_type
are autogenerated using a script. If the script fails to get areturn_type
for a function it creates notebooks to help retrieve thereturn_type
dynamically. On running an update script the config JSON is updated based on these notebooks.- Updated the extension to add all these exploration notebooks to the
_missing_return
directory.
- Updated the extension to add all these exploration notebooks to the
-
There can be multiple paths from which a class/function can be accessed, however, if there is a missing
return_type
in methods of a class one would have to update thereturn_type
in all the paths from which the class/function can be accessed.- Updated the JSON generation script and the update script to add
return_type
changes to all these paths based on thereturn_type
changes made to the original path.
- Updated the JSON generation script and the update script to add
-
- Generate Wrapper Class based on attributes of the type/class without a need to create a proto for the type.
- Add support for all pandas
Indexes
andIndexer
(eg,_LocIndexer
)/ - Add support for Window, Groupby, and Resampling.