Name: Rohan Babbar
Organization: NumFocus
Sub-Organization: PyLops
Mentors: Matteo Ravasi, Yuxi Hong, Carlos da Costa
PyLops is an open-source Python library focused on providing a backend-agnostic, idiomatic, matrix-free library of linear operators and related computations. It has been developed to solve large-scale inverse problems. Keeping in mind the memory size of a single machine, there is a need to perform the computations in a distributed fashion.
The main objective of this project is to utilize the mpi4py package for exchanging messages among various processes across distributed memory. To elaborate, the project aims to introduce a library called pylops-mpi, which leverages MPI communication to efficiently and in parallel address complex operations and inversions on a large scale. Furthermore, pylops-mpi will be the preferred choice when the model and data sizes are large, and performing operations on a single node/process is challenging.
Important links related to the project:
- Project Idea - MPI backend for distributed inverse problems
- Project Proposal - Selected Proposal
- Project Repository - pylops-mpi
- Merged Pull Requests - GSOC 2023 PRs
- Blogs - Week 1 & 2, Week 3 & 4, Week 5 & 6, Week 7 & 8, Week 9 & 10
Below is a list of tasks that were completed by me during the Google Summer of Code 2023, along with the corresponding pull requests related to each task. The repository was initiated from scratch, and majority of the pull requests were authored by me. The code has been integrated, along with corresponding test cases that were written using pytest and pytest-mpi. Additionally, each new operator/class is accompanied by a sample toy example to illustrate their functionality.
-
Setup CI Pipeline and Sphinx Documentation - Created a continuous integration (CI) pipeline on GitHub Actions by utilizing setup-mpi. Additionally, I've also configured a web-based Sphinx documentation setup to facilitate comprehensive documentation and examples.
#8 - Integrated Sphinx to create comprehensive and well-structured documentation.
#10 - Linting usingflake8
as a GitHub Action.
#12 - Includedbuild.yml
as a GitHub Action that automatically triggers whenever a pull request is created. This action involves building the package and conducting tests using pytest-mpi.
#13 - Addeddeploy-docs.yml
as a GitHub Action, designed to deploy the documentation using GitHub Pages. -
DistributedArray class - This is our fundamental array class that is used throughout the library, offering an efficient alternative to directly using the Numpy Arrays. With this class, you gain the ability to partition sizable Numpy Arrays into smaller local arrays, effectively distributing them across different ranks. Moreover, it also simplifies the process of broadcasting the Numpy Array to distinct processes.
#2 - Initial version of the DistributedArray class with parameters such as
global_shape
,base_comm
,partition
anddtype
. The class provides operations such as addition and multiplication of DistributedArrays.
#6 - Partition is now an enum with options ofPartition.SCATTER
andPartition.BROADCAST
.
#7 - Introduced theaxis
parameter, allowing partitioning along the specified axis. Furthermore, also implemented plotting functions for the DistributedArray, enhancing visualization and understanding.
#11 - Addeddot
product andnumpy.linalg.norm
method to the DistributedArray class which perform operations by flattening local arrays and ultimately yielding a scalar as output.
#49 - Addedadd_ghost_cells
method to the DistributedArray which facilitates the addition of ghost cells, either in the front, back, or both sides of the local array at each rank along the axis of partition.
#61 - Added a new parameter calledlocal_shapes
to the DistributedArray, which allows users to assign shapes to local arrays at each rank. -
Added MPILinearOperator - This class serves as the parent class for all of our MPI operators. When creating a new MPILinearOperator, the operator needs to extend this class and provide three essential members: the operator's shape, the operator's data type (dtype), and the Base MPI Communicator(base_comm).
#26 - Added the MPILinearOperator as a standalone base class, along with various functions such as adjoint, transposition, summation, product, complex conjugate, and scaling for MPILinearOperators.
#48 - Integrated support forPartition.BROADCAST
in the _matvec and _rmatvec operations to facilitate matrix-vector products. -
MPI Stacking Operators - These operators play a crucial role in stacking one or more PyLops operators across different ranks, whether horizontally, vertically, or diagonally. Each operator conducts its matrix-vector operations utilizing its respective model vector and generates the data vector. Both the model and data vectors are represented as DistributedArray objects.
#26 - Developed an MPIBlockDiag operator that constructs a block-diagonal operator from a set of linear operators using MPI.
#36 - Developed an MPIVStack operator that constructs a vertical stack of a set of linear operators using MPI.
#45 - Developed an MPIHStack operators that constructs a horizontal stack of a set of linear operators using MPI. -
Iterative Solvers for Inversion utilizing DistributedArray - Solve an system of equations given a MPILinearOperator
Op
and DistributedArray data vectory
using conjugate gradient iterations.#29 - Implemented
cgls
which facilitates iterative inversion. This method can be used by users to solve an overdetermined system of equations by giving the MPILinearOperatorOp
and distributed datay
as parameters.
#57 - Updated tests intest_solver
which guarantees that MPIBlockDiag and MPIVStack are positive definite matrices and so fully invertible.
#73 - Implementedcg
which facilitates iterative inversion. This method can be used by users to solve a square system of equations by giving the MPILinearOperatorOp
and distributed datay
as parameters. -
Derivative Operators - These operators apply finite-difference stencils to DistributedArray. They involve communication of border cells from one process to another in order to calculate the derivative.
#55 - Added MPIFirstDerivative which applies a first-derivative to a DistributedArray using either a first-order backward and forward stencil or a second or third ordered centered stencil.
#58 - Added MPISecondDerivative which applies a second-derivative to a DistributedArray using either a second-order forward, backward or a centered stencil.
#60 - Added MPILaplacian which applies a second-derivative along multiple directions of a DistributedArray. -
Gallery Examples - Added a variety of examples for each newly created operator/class.
#7 - Added DistributedArray examples.
#43 - Added cgls example.
#44 - Added stacking examples usingMPIBlockDiag
,MPIVStack
andMPIHStack
.
#48 - Included an example demonstrating how to useasmpilinearoperator
to wrap a pylops operator, combine it with other basicoperators, and subsequently execute matrix-vector products.
#58 - Added derivative examples. -
Post Stack Inversion Tutorial - This tutorial demonstrates the implementation of a distributed 3D Post-stack inversion. It consists of the first part showing how to model a 3D synthetic post-stack seismic data from a 3D model of the subsurface acoustic impedence in a distributed manner, followed by a second part when inversion is carried out.
#28 - Introduced the initial version of the Post Stack Inversion example. In this example, the Post Stack Linear Model is passed to an
MPIBlockDiag
operator, followed by a matrix-vector product. The resulting output is then plotted, displaying the model, smoothened model, and the data vector.
#38 - Updated the tutorial by performing the inversion of theMPIBlockDiag
operator using thecgls
solver, regularized inversion by applying second-derivative along non-distributed dimensions. -
Least-Squares Migration(LSM) Tutorial - In this tutorial, sources are distributed across different ranks, and each
pylops.waveeqprocessing.LSM
is responsible for performing modelling with the reflectivity at each rank in the subsurface.#50 - Used the
MPIVStack
to perform vertical stacking of LSMs to solve the problem and calculate the inversion using thecgls
solver. -
Some More Bug Fixes and Enhancements - These include some of the important changes which were done in the repository.
#24 - Added
mpi_examples.sh
which runs all files in a particular folder usingmpiexec
. Furthermore, addedrun_examples
andrun_tutorials
in the Makefile.
#32 - All tests utilizenumpy.testing.assert_allclose
with artol=1e-14
, which is the standard practice and should be adhered to when verifying any new test.
#37 - When set toPartition.BROADCAST
, the modifications made by other ranks will be discarded and overwritten with the value atrank=0
.
#55 - Introduced areshaped
decorator which carries out extra-communication under the hood if the local shapes are not perfectly aligned with the operator at each rank.
- Maintaining the repository
- Adding new MPI linear operators to the library
- Contributing more to the PyLops organization
Nearly all the proposed work outlined in the proposal has been successfully completed, accompanied by thorough documentation and suitable test cases. Collaborating with the PyLops organization has been an absolute pleasure, and these past 3 months have been truly remarkable. I have learned a great deal, ranging from adopting good coding practices to diving deeper into MPI, diverse linear operators, and inversion solvers. I am grateful to all my mentors for their guidance, motivation, and their willingness to teach me new concepts. This summer has been both hectic and enjoyable, and I eagerly look forward to making further contributions to PyLops in the future.