Created
September 15, 2020 06:32
-
-
Save kuanb/a45b65c3135dce717497643e7f35f0ab to your computer and use it in GitHub Desktop.
attempt to improve on https://gist.github.com/kuanb/5908ebde364122a0ff12a6f1d734537a
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Kuan- This Gist was super helpful for understanding the algorithms presented in the Microsoft paper. In your article, you mentioned, " performance speed up is rather marginal because of the high level of fan-out that occurs on later iterations of transfers."
I implemented your code, and I refactored it to take better advantage of Pandas / Numpy vectorization. The bottlenecks that you experienced were a results of the
for
loops and the number of times you are slicing the dataframes. If you eliminate the looping, you get a significant time savings. In my simple example for the Sacramento RTD GTFS feed, the code runs in less than a second for 3 transfers. I'm sure additional enhancements are possible with Numpy.I also got significant performance improvements from caching the walk transfers as you suggested. However, you could still get really quick results with real-time transfer processing if you eliminate the Geopandas intersect methods. Instead, since these are distance calculations (either polar or cartesian) you can quickly create a stop-to-stop cross join and do the distance calculation with simple arithmetic (trigonometry) to get distances between stops and filter to the appropriate lengths. Using the arithmetic approach, I can calculate all of the stop-to-stop transfers in just a second or two for Sacramento.
In any event, I wanted to say thank you for putting this example out here, and it was super helpful.
Clint