gegen07/gsoc21-final-submission.md

## gsoc21-final-submission.md

      
    Raw
  

              gsoc21-final-submission.md
            
          
    Facility Location Modeling Package

Summary List


Fork the repo pysal/spopt
Create a dev branch gsoc21-facloc in my forked repo
Create the first notebook to familiarize me with the facility location problems following the example provided by mentor J. Gaboardi
Design the architecture of the package using Builder Design Pattern for constraints
Create the four models proposed: LSCP, MCLP, P-Median, P-Center
Design mixins for derived results
Develop test for module
Code documentation
Create example notebooks for the models

LSCP
MCLP
PMedian
PCenter


Develop the plot of results
Create a real world problem notebook using Huanfa's dataset
Pull request

Task 0.

I applied to this project without much knowledge about the subject. Before to apply I went to books links that were provided by the project list page to get to know what I would write in my proposal to be in concordance with what had been expected.
Task 1.

After my proposal was approved, I started to delve into the repository, forked it, and started preparing my environment with the requirements. Also, I did a small search in issues to know if the project had already had some mentions or architecture style to follow. Tasks 1 and 2 were done before and during the onboarding phase.
Task 2.

I created a dev branch named gsoc21-facloc thinking about the future. It would be easier to create a pull request and merge it. This branch was my development branch, all I did was pushed to it. I also thought about pushing a commit regularly like once a day. If it couldn't be possible was ok, though.
Task 3.

The coding phase began and I created the first notebook using the example provided by the mentor J. Gaboardi as a base. So, I coded all four models using that notebook. To test it I used an empirical example created by J. Gaboardi in his notebook. This notebook was a great start to familiarize me with the optimization package PULP as it would be used to build the module.
Task 4.

After the first meeting, I was asked to begin developing the module with functions and classes. This task was tough because init a development process, build a module architecture are always a difficult task. I started with LSCP and MCLP since the two models are Coverage models, so they could show me some improvements in the style. I thought of building a model step by step. First, define a variable, the constraints, and then add the objective function. I realized this process was similar to Builder Pattern, so I tried and it worked smoothly. In the next meeting, we discuss the architecture developed. The mentors agreed with the architecture and then I continued using this architecture style.
Task 5.

This task was the easiest all of GSoC because the architecture had already been thought. I had to only follow it and fix minors inconsistencies during the development. First I finished the MCLP and LSCP and then I tested it globally, I checked only when they were solved if they are a specific object. After that, I begin the integration of tests with Github Actions. I had never used Github Actions, it was kind of hard because there were details with the PULP package but I got it and it worked perfectly as expected. At the next meeting, the mentors asked me to develop the last two models: PCenter and PMedian. I did the development and tested globally as I did for MCLP and LSCP. And then the first month of GSoC was done.
Task 6.

Each model has a result and we can use some of them to create summarized statistics. However, some results are not global, in other words, a model can produce a result that other models can't. This is a big problem in architecture design, it's the same problem of task 5. Nevertheless, Levi proposes to use another design pattern named Mixin. It's a minimum class with a single method that can be inherited and then the class that inherits it has this functionality. So, to summarize the results produced by each model I developed 3 Mixins: CoveragePercentage, MeanDist, and a BaseOutputMixin. The first one calculates the percentage covered by MCLP, the second one calculates the average distance of PMedian result, and the BaseOutputMixin is composed of two functionality, array showing uncovered clients and a client-facility lookup.
Task 7.

This task was the hardest one. I hadn't had an experience yet with tests and I knew that is important for open-source and for maintenance for every software. First, the mentors asked me to do a global test for every model, so it tests if when the method solve is called, the returned model is a specific class object. After doing great progress on model development they asked me to do tests more specifically if the model solves in an optimal way or the problem is infeasible. After doing these tests, we wanted to cover the code the maximum as possible, so more tests came. I developed a test for errors, a test for warnings that are supposed to be raised at certain conditions, a test for functionality like mixins. However, we got a lot of errors doing that specifically with the facility-client lookup. After some searches and discussions with the mentors we realized that PULP doesn't solve every model in the same way in different OS, so we move some exact checks to synthetic data since using the synthetic data we have more control in results.
Task 8.

I wrote the docstrings for classes and functions following the Diatáxis Framework and the style of pysal.
Task 9.

For reproducibility and examples, the mentors asked me to wrote notebooks. We decided to use synthetic data for simplicity, so each notebook has a specific aim to explain the model with math notation and use the synthetic data to illustrate it.

LSCP
MCLP
PMedian
PCenter

Task 10.

That task was awesome, plotting always is an exciting task. We developed a plot to show the results for each model. I think this task is better to display images than explain how it is done.

LSCP image
MCLP image
PCenter image
PMedian image

Task 11.

This task was based on a dataset produced by Huanfa Chen when he was writing this paper. The datasets are online. We used only the San Francisco Store that is composed of 16 facility candidate sites and a census tract with 205 client points and the problem outlined in the paper is to use 4 facilities to cover the maximum client points as possible (MCLP model). The result obtained in the paper was 91.6% and we got with our approach 87.6% of coverage.
Task 12.

Finally the pull request was merged. 🎉
Also the tests got awesome codecov reported to almost 100% code coverage 🚀 .
Further Work


Update Locate Docs

Nice module enhancements in future


Add service capacity
Add Backup coverage model
Add Polygon partial coverage