Skip to content

Instantly share code, notes, and snippets.

@rohanbabbar04
Created August 22, 2023 04:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rohanbabbar04/8d3b957fd359c4c14d1d3882b38967f9 to your computer and use it in GitHub Desktop.
Save rohanbabbar04/8d3b957fd359c4c14d1d3882b38967f9 to your computer and use it in GitHub Desktop.
Final Report for Google Summer of Code - 2023 (Organization - NumFOCUS, Sub-Organization - PyLops)

Final Report for Google Summer of Code 2023

PyLops, NUMFOCUS

Project: MPI backend for distributed inverse problems

Details

Name: Rohan Babbar
Organization: NumFocus
Sub-Organization: PyLops
Mentors: Matteo Ravasi, Yuxi Hong, Carlos da Costa

Abstract

PyLops is an open-source Python library focused on providing a backend-agnostic, idiomatic, matrix-free library of linear operators and related computations. It has been developed to solve large-scale inverse problems. Keeping in mind the memory size of a single machine, there is a need to perform the computations in a distributed fashion.

The main objective of this project is to utilize the mpi4py package for exchanging messages among various processes across distributed memory. To elaborate, the project aims to introduce a library called pylops-mpi, which leverages MPI communication to efficiently and in parallel address complex operations and inversions on a large scale. Furthermore, pylops-mpi will be the preferred choice when the model and data sizes are large, and performing operations on a single node/process is challenging.

Work Done

Important links related to the project:

Below is a list of tasks that were completed by me during the Google Summer of Code 2023, along with the corresponding pull requests related to each task. The repository was initiated from scratch, and majority of the pull requests were authored by me. The code has been integrated, along with corresponding test cases that were written using pytest and pytest-mpi. Additionally, each new operator/class is accompanied by a sample toy example to illustrate their functionality.

  • Setup CI Pipeline and Sphinx Documentation - Created a continuous integration (CI) pipeline on GitHub Actions by utilizing setup-mpi. Additionally, I've also configured a web-based Sphinx documentation setup to facilitate comprehensive documentation and examples.

    #8 - Integrated Sphinx to create comprehensive and well-structured documentation.
    #10 - Linting using flake8 as a GitHub Action.
    #12 - Included build.yml as a GitHub Action that automatically triggers whenever a pull request is created. This action involves building the package and conducting tests using pytest-mpi.
    #13 - Added deploy-docs.yml as a GitHub Action, designed to deploy the documentation using GitHub Pages.

  • DistributedArray class - This is our fundamental array class that is used throughout the library, offering an efficient alternative to directly using the Numpy Arrays. With this class, you gain the ability to partition sizable Numpy Arrays into smaller local arrays, effectively distributing them across different ranks. Moreover, it also simplifies the process of broadcasting the Numpy Array to distinct processes.

    #2 - Initial version of the DistributedArray class with parameters such as global_shape, base_comm, partition and dtype. The class provides operations such as addition and multiplication of DistributedArrays.
    #6 - Partition is now an enum with options of Partition.SCATTER and Partition.BROADCAST.
    #7 - Introduced the axis parameter, allowing partitioning along the specified axis. Furthermore, also implemented plotting functions for the DistributedArray, enhancing visualization and understanding.
    #11 - Added dot product and numpy.linalg.norm method to the DistributedArray class which perform operations by flattening local arrays and ultimately yielding a scalar as output.
    #49 - Added add_ghost_cells method to the DistributedArray which facilitates the addition of ghost cells, either in the front, back, or both sides of the local array at each rank along the axis of partition.
    #61 - Added a new parameter called local_shapes to the DistributedArray, which allows users to assign shapes to local arrays at each rank.

  • Added MPILinearOperator - This class serves as the parent class for all of our MPI operators. When creating a new MPILinearOperator, the operator needs to extend this class and provide three essential members: the operator's shape, the operator's data type (dtype), and the Base MPI Communicator(base_comm).

    #26 - Added the MPILinearOperator as a standalone base class, along with various functions such as adjoint, transposition, summation, product, complex conjugate, and scaling for MPILinearOperators.
    #48 - Integrated support for Partition.BROADCAST in the _matvec and _rmatvec operations to facilitate matrix-vector products.

  • MPI Stacking Operators - These operators play a crucial role in stacking one or more PyLops operators across different ranks, whether horizontally, vertically, or diagonally. Each operator conducts its matrix-vector operations utilizing its respective model vector and generates the data vector. Both the model and data vectors are represented as DistributedArray objects.

    #26 - Developed an MPIBlockDiag operator that constructs a block-diagonal operator from a set of linear operators using MPI.
    #36 - Developed an MPIVStack operator that constructs a vertical stack of a set of linear operators using MPI.
    #45 - Developed an MPIHStack operators that constructs a horizontal stack of a set of linear operators using MPI.

  • Iterative Solvers for Inversion utilizing DistributedArray - Solve an system of equations given a MPILinearOperator Op and DistributedArray data vector y using conjugate gradient iterations.

    #29 - Implemented cgls which facilitates iterative inversion. This method can be used by users to solve an overdetermined system of equations by giving the MPILinearOperator Op and distributed data y as parameters.
    #57 - Updated tests in test_solver which guarantees that MPIBlockDiag and MPIVStack are positive definite matrices and so fully invertible.
    #73 - Implemented cg which facilitates iterative inversion. This method can be used by users to solve a square system of equations by giving the MPILinearOperator Op and distributed data y as parameters.

  • Derivative Operators - These operators apply finite-difference stencils to DistributedArray. They involve communication of border cells from one process to another in order to calculate the derivative.

    #55 - Added MPIFirstDerivative which applies a first-derivative to a DistributedArray using either a first-order backward and forward stencil or a second or third ordered centered stencil.
    #58 - Added MPISecondDerivative which applies a second-derivative to a DistributedArray using either a second-order forward, backward or a centered stencil.
    #60 - Added MPILaplacian which applies a second-derivative along multiple directions of a DistributedArray.

  • Gallery Examples - Added a variety of examples for each newly created operator/class.

    #7 - Added DistributedArray examples.
    #43 - Added cgls example.
    #44 - Added stacking examples using MPIBlockDiag, MPIVStack and MPIHStack.
    #48 - Included an example demonstrating how to use asmpilinearoperator to wrap a pylops operator, combine it with other basicoperators, and subsequently execute matrix-vector products.
    #58 - Added derivative examples.

  • Post Stack Inversion Tutorial - This tutorial demonstrates the implementation of a distributed 3D Post-stack inversion. It consists of the first part showing how to model a 3D synthetic post-stack seismic data from a 3D model of the subsurface acoustic impedence in a distributed manner, followed by a second part when inversion is carried out.

    #28 - Introduced the initial version of the Post Stack Inversion example. In this example, the Post Stack Linear Model is passed to an MPIBlockDiag operator, followed by a matrix-vector product. The resulting output is then plotted, displaying the model, smoothened model, and the data vector.
    #38 - Updated the tutorial by performing the inversion of the MPIBlockDiag operator using the cgls solver, regularized inversion by applying second-derivative along non-distributed dimensions.

  • Least-Squares Migration(LSM) Tutorial - In this tutorial, sources are distributed across different ranks, and each pylops.waveeqprocessing.LSM is responsible for performing modelling with the reflectivity at each rank in the subsurface.

    #50 - Used the MPIVStack to perform vertical stacking of LSMs to solve the problem and calculate the inversion using the cgls solver.

  • Some More Bug Fixes and Enhancements - These include some of the important changes which were done in the repository.

    #24 - Added mpi_examples.sh which runs all files in a particular folder using mpiexec. Furthermore, added run_examples and run_tutorials in the Makefile.
    #32 - All tests utilize numpy.testing.assert_allclose with a rtol=1e-14, which is the standard practice and should be adhered to when verifying any new test.
    #37 - When set to Partition.BROADCAST, the modifications made by other ranks will be discarded and overwritten with the value at rank=0.
    #55 - Introduced a reshaped decorator which carries out extra-communication under the hood if the local shapes are not perfectly aligned with the operator at each rank.

Future Work

  • Maintaining the repository
  • Adding new MPI linear operators to the library
  • Contributing more to the PyLops organization

Final Words

Nearly all the proposed work outlined in the proposal has been successfully completed, accompanied by thorough documentation and suitable test cases. Collaborating with the PyLops organization has been an absolute pleasure, and these past 3 months have been truly remarkable. I have learned a great deal, ranging from adopting good coding practices to diving deeper into MPI, diverse linear operators, and inversion solvers. I am grateful to all my mentors for their guidance, motivation, and their willingness to teach me new concepts. This summer has been both hectic and enjoyable, and I eagerly look forward to making further contributions to PyLops in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment