Skip to content

Instantly share code, notes, and snippets.

@kuanb
Created September 15, 2020 06:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kuanb/a45b65c3135dce717497643e7f35f0ab to your computer and use it in GitHub Desktop.
Save kuanb/a45b65c3135dce717497643e7f35f0ab to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@danielsclint
Copy link

danielsclint commented Jul 21, 2021

Hi Kuan- This Gist was super helpful for understanding the algorithms presented in the Microsoft paper. In your article, you mentioned, " performance speed up is rather marginal because of the high level of fan-out that occurs on later iterations of transfers."

I implemented your code, and I refactored it to take better advantage of Pandas / Numpy vectorization. The bottlenecks that you experienced were a results of the for loops and the number of times you are slicing the dataframes. If you eliminate the looping, you get a significant time savings. In my simple example for the Sacramento RTD GTFS feed, the code runs in less than a second for 3 transfers. I'm sure additional enhancements are possible with Numpy.

I also got significant performance improvements from caching the walk transfers as you suggested. However, you could still get really quick results with real-time transfer processing if you eliminate the Geopandas intersect methods. Instead, since these are distance calculations (either polar or cartesian) you can quickly create a stop-to-stop cross join and do the distance calculation with simple arithmetic (trigonometry) to get distances between stops and filter to the appropriate lengths. Using the arithmetic approach, I can calculate all of the stop-to-stop transfers in just a second or two for Sacramento.

In any event, I wanted to say thank you for putting this example out here, and it was super helpful.

Clint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment