The following report summarizes the work done during Google Summer of Code 2022 along with the results, scope for improvements and future work. This also serves as the final project report with all the contributions.
- Name: Madhav Mittal
- Email: madhav.mittal.mat19@itbhu.ac.in
- Github Username: Madhav2310
- LinkedIn: https://www.linkedin.com/in/madhav-mittal-599569192/
- University: Indian Institute of Technology (BHU), Varanasi
- Organization: Python Software Foundation
- Sub-Organization: LPython
- Project Title: Implement modules from the Python Standard Library
- Project Link: github.com/lcompilers/lpython
- Project Mentors: Ondřej Čertík, Gagandeep Singh, Rohit Goswami
LPython is an ahead-of-time compiler for Python, built using the Abstract Semantic Representation (ASR) technology. It is currently in the pre-alpha stage and in heavy development. LPython is written in C++, and it has multiple backends to generate code, including LLVM and C++. It is designed as a library with separate building blocks – the parser, Abstract Syntax Tree [AST], Abstract Semantic Representation [ASR], semantic phase, codegen – that are all exposed to the user or developer in a natural way to make it easy to contribute back. It works on Windows, Linux, and Mac OS.
The speed of LPython comes from the high-level optimizations done at the ASR level, as well as the low-level optimizations that the LLVM can do. My project, in particular, involved discussing which modules will be needed for LPython (from a scientific computing perspective, in the beginning), creating a priority list, and then implementing each module properly. The aim of this project was to make LPython work for any Python code down the road.
- Implementing priority modules needed for LPython.
- Creating extensive integration tests for respective functions in modules.
- Zero bugs - Fix the currently identified bugs.
- The best possible performance for numerical array-oriented code.
- Compile a subset of Python and be Python-3+336 that runs on all platforms.
- Fast compilation.
- Make the documentation user and developer-friendly.
- Weekly video conferences over Google Meet were the primary mode of communication with the mentors.
- The LPython zulip workspace was used to resolve all doubts, suggestions, and comments along with Github Issues for faster coordination.
- Weekly blogs and updates were provided in the community to ensure everyone was updated with the project.
Phase 1: Math, String and Decimal libraries
- #377: With this PR, I implemented the
trunc()
function ofMath
Library - #395 : Raised an issue for passing a list iterable in a function.
- #397: Initial implementation of
Capitalize()
function ofString
Library - #564: With this PR, I implemented the
cbrt()
(Cube root) andExp2
functions ofMath
Library - #567: Raised an issue to clear up on handling of different function arguments and corresponding return types.
- #581: With this PR, I attempted overloading the
pow()
andFabs
functions ofMath
Library. - #599 : Bug: Overload based on return type.
- #669: With this PR I expanded the
String
Library withUpper()
andLower()
functions. - #679: With this PR, I initiated the implementation of the
Decimal
Library with thedataclass
. - #696: Raised an issue about the need for class functionality for implementing various libraries like decimal, fraction, numbers etc.
- #722: Raised an issue about a bug in annotation assignment in
dataclass
. - #731: With this PR, I overloaded the
pow()
andFabs
functions ofMath
Library for several datatypes. - #732: Raised an issue about the missing feature of iterating over strings and other iterables.
- #733: Raised an issue about the bug which fails string to list casting.
- #747: With this PR, I did integration and reference testing of
Pow
andFabs
functions ofMath
Library.
Phase 2: Statistics and Numpy_intrinsic libraries
- #769: With this PR, I initiated the implementation of the
Statistics
Library with theMean
,Geometric Mean
&Harmonic Mean
functions. - #770: Raised an issue about the bug where DoLoops increment by
i32
but fail with every other datatype. - #771: With this PR, I solved the
#770
issue by generalizing the increment for all valuesi8
,i16
,i32
, andi64
. - #876: With this PR, I overloaded
Pow
for Modulo third argument along with integration tests. - #877: With this PR, I implemented the
Fmean
function and overloadedMean
for various datatypes. - #901: With this PR, I expanded the
Statistics
library withVariance
andStdev
functions. - #956: With this PR, I overloaded the
Gmean
andHmean
functions for other datatypes. - #958: With this PR, I expanded the
Statistics
library with theMode
function. - #961: With this PR, I expanded the
Statistics
library withCovariance
andCorrelation
functions. - #1002: With this PR, I expanded the
Numpy_intrinsic
library withSinh
andCosh
functions along with extensive tests. - #1003: With this PR, I overloaded
Covariance
andCorrelation
functions for various datatypes. - #1004: With this PR, I expanded the
Statistics
library withPvariance
andPstdev
functions. - #1005: With this PR, I expanded the
Statistics
library with theLinear Regression
function. - #1064: With this PR, I expanded the
Numpy_intrinsic
library withTanh
andExp
functions along with extensive tests. - #1075: With this PR, I expanded the
Numpy_intrinsic
library withArcsinh
andArccosh
functions along with extensive tests. - #1081: With this PR, I expanded the
Numpy_intrinsic
library withArctanh
function along with extensive tests. - #1083: With this PR, I expanded the
Numpy_intrinsic
library withFloor
andCeil
functions along with extensive tests.
- The Numpy_intrinsic library, a high-priority library, currently still has various functions remaining, and I will continue working on implementing them in the coming weeks.
- There are also various more important libraries yet to be implemented like String, Number, Fraction, Decimal, OS, Itertools, Functools, Time, etc. However, most of these require Classes to be implemented in the backend. So this is something that can be worked upon parallelly.
- Most functions are implemented with lists; as we move forward, we should overload them for tuples(recently implemented), sets(not implemented), and dictionaries(not implemented).
Throughout the duration of the project, I have grown a lot. In this project, I implemented in the backend the functions that I had simply been using all these years. I learned about algorithms, the testing, and the level of detail that goes into making them. I learned about constructing compilers and their various stages. Writing code for several corner cases was very interesting, which we usually tend to ignore. For example, in LPython, it is not possible to add an integer with a floating point number. A cast should be performed before performing the addition. Participating in GSoC has definitely improved my interpersonal and technical skills. Here are some of the prominent ones:
- Collaborating with different teams across different time zones.
- Understanding and implementing the opinion and suggestions of different stakeholders in the project.
- Studying the workflow of projects and possible customizations that can be made.
- Analysing different approaches to solve a problem along with their pros and cons.
- Making a simple and easy-to-follow process that can easily be understood and followed by the community members.
- Writing clean and well-commented code.
- Understanding and reviewing code written by someone else and modifying it.
- Implementing checks and error handling to alert the user in case of any malfunction.
- Got to know about and work on various tech stacks and technologies like C++, Python, LLVM, libasr, ASTs, etc.
- Effectively communicating the project details by writing weekly blogs and sending updates to the whole community.
Overall, this project has made me a better programmer, problem solver, debugger, tester, and a team player!
The project would not have been possible without the ongoing support of many people. I would especially like to thank:
- Gagandeep Singh for being the most supporting and patient mentor. If it weren't for your guidance, sir, I wouldn't have made it so far.
- Ondřej Čertík,, Smit Lunagariya, Naman Gera for discussing, suggesting improvements and assisting me throughout the project.
- Ubaid Shaikh, for being a great partner. Really greatful for his great insights and discussions.
- LPython Community for discussing the project during community calls and being so appreciative about it.
- Friends and family for their encouragement and moral support throughout this project.