enjeck/sagemath-gsoc.md

## sagemath-gsoc.md

      
    Raw
  

              sagemath-gsoc.md
            
          
    This document is my final submission for the GSoC project Edge connectivity and edge disjoint spanning trees in digraphs for SageMath under the mentorship of David Courdert.
Intro

During the contribution period, I installed a local version of SageMath on my system and made several minor bug fixes to familiarize myself with the codebase:

BipartiteGraph() fails to create graph from graph6 string
canonical_label returns incorrect partite sets

Thanks to these minor contributions, I was a bit familiar with the codebase and the code submission process by the time my project started.
My proposal was to implement an algorithm for finding the edge connectivity of directed graphs from a research paper [1]. Before the coding period started, there was already Cython code that implemented the algorithm proposed by Gabow [1] for computing the edge connectivity of directed graphs. This meant the entire scope of my project had to change. I started by looking at the TODOs of the existing code. At that point in time, the TODOs were:
- Add speedup methods proposed in GKLP2021 [2] for the edge connectivity
- Implement the tree-packing algorithms proposed in Gabow1995 [1] and BHKP2008 [3]
- Extend to digraphs with multiple edges
- Extend to weighted digraphs

There was another contributor working with me on this project. While she handled TODO number 2, I worked on TODO number 1. I spent a lot of time reading and re-reading the paper GKLP2021 [2], especially section 3.3.2 with a paragraph titled Fast initialization. My newly-defined project focused on speeding up the existing edge connectivity code by adding a depth-first search based approach to initialize the various variables.
Key Challenges

I didn't face any major issues with running a local version of SageMath or submitting my code during the coding period, since I made sure that my environment was ready during the contribution period.
The greatest challenge was understanding the existing Cython implementation and figuring out where/what to modify. I also struggled with understanding the various research papers, especially since I don't have a graph theory background and had to learn everything from scratch.
Work Done

All my code submissions for the project were done under this ticket: https://trac.sagemath.org/ticket/34123. This ticket has been merged \o/
My first working implementation wasn't great. First of all, it was slower than the existing code which is a step backwards because the goal was the speedup the code. Secondly, it didn't work on all types of graphs. It only worked on complete digraphs. For other graphs, it either didn't work or I got stuck in an infinite recursive loop (I crashed my computer several times because of this bug). It took several weeks of trying various things before figuring out a solution to the infinite recursive loop. Fixing this also sped up my code, making it about 2 times faster than the original code for some graphs.
It was a relief that my code worked, but I wasn't sastisfied because the research paper claimed that the changes made should result in a 9x speedup, not the barely 2x speedup I was currenly receiving. My mentor suggested implementing an iterative version of the DFS-based intialization, with hopes that it might be faster than the recursive version. In the end, the iterative version was a bit faster for some graphs, and the recursive faster for others. We decided to keep both versions, and added extra parameters to the edge connectivity class to permit the user to select whether or not to use the "fast initialization" and to either select the recursive or iterative version.
Work left

As earlier stated, the current fast initialization is not as fast as we hoped. Maybe I'm missing something from the paper. Maybe Cython can't bring about the same speedups as realized from the paper's author C code.
There are still more TODO's left for this edge connectivity class. TODO number 2 involving tree-packing is still in progress. I intend to contribute to this at some point, most likely during the testing phase. In the meantime, I have been picking up on a ticket I started and didn't finish during the contribution period: Enumeration of Spanning trees in increasing order of weights
What I learned

When I looked into applying for GSoC months ago, as a mathematics and computer science student, my main goal was to work on something that was at the intersection of mathematics and computer science. I'm so grateful to have been chosen and mentored under the SageMath project. Over the past 8 months, I have learned:

Graph Theory: After learning so much about graph theory and falling in love with it, I'm considering going to grad school and exploring more research under this topic.
Algorithms: My knowledge of algorithms, especially graph algorithms, has vastly improved. I have learned about various ways of implementing algorithms efficiently.
Reading and writing code: It's definitely a skill to be able to read others code and understand it. I also learned how to write readable code and good practices like adding comments whereever necessary.
Persistence: There were times where I worked on a single bug for weeks, and didn't give up. It's definitely rewarding to struggle on something for so long and eventually get it to work!


Bibliography


[1] Harold N. Gabow: A Matroid Approach to Finding Edge Connectivity and Packing Arborescences. Journal of Computer and System Sciences, 50(2):259-273 (1995)
[2] Loukas Georgiadis, Dionysios Kefallinos, Luigi Laura, Nikos Parotsidis. An Experimental Study of Algorithms for Computing the Edge Connectivity of a Directed Graph.  SIAM Symposium on Algorithm Engineering and Experiments (ALENEX), pp 85-97, 2021. :doi:10.1137/1.9781611976472.7
[3] Anand Bhalgat, Ramesh Hariharan, Telikepalli Kavitha and Debmalya Panigrah. Fast edge splitting and Edmonds' arborescence construction for unweighted graphs. ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 455-464, 2008. :doi:10.5555/1347082.1347132