Skip to content

Instantly share code, notes, and snippets.

@acrosby
Last active December 9, 2020 15:50
Show Gist options
  • Save acrosby/4601257 to your computer and use it in GitHub Desktop.
Save acrosby/4601257 to your computer and use it in GitHub Desktop.
Sample code for python rtree bulk loading of 10,000,000 points and a nearest neighbor query.
from rtree import index
from random import random
from datetime import datetime
timer = datetime.now()
# Create 10,000,000 random numbers between 0 and 1
rands = [random() for i in range(10000000)]
# Function required to bulk load the random points into the index
# Looping over and calling insert is orders of magnitude slower than this method
def generator_function():
for i, coord in enumerate(rands):
yield (i, (coord, coord+1, coord, coord+1), coord)
# Add points
tree = index.Index(generator_function())
print (datetime.now()-timer).seconds # How long did it take to add the points
print list(tree.nearest((rands[50], rands[50], rands[50], rands[50]), 3))
print (datetime.now()-timer).seconds # How long did it take to query for the nearest 3 points
@Tasneem-gh
Copy link

How can we use the generator function to read data from a file, instead of randomly generated data?

@acrosby
Copy link
Author

acrosby commented Dec 9, 2020

@Tasneem-gh This was just a speed test of the generator performance, nothing more. I suggest you look into the rtree and Python file io documentation if you are interested in serializing and deserializing data.

@Tasneem-gh
Copy link

Tasneem-gh commented Dec 9, 2020

@acrosby
Thank you for replying
Actually I am using a different rtree library for building the index but I though of using the generator function to speed up the building process because it takes around an hour to build 1 million data records.

So, serializing data allows reading from a file and I can use the generator function along with that?

@acrosby
Copy link
Author

acrosby commented Dec 9, 2020

@Tasneem-gh That would depend on if your rtree library supports a generator as an input. But if it does, then you could read a text file of coordinates line by line as a generator inside of a generator that does some processing of the lines and yield the result. Here is some info that may be helpful: https://realpython.com/introduction-to-python-generators/

@Tasneem-gh
Copy link

@acrosby Thanks for the hint. Will try that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment