Skip to content

Instantly share code, notes, and snippets.

@answerquest
Last active August 20, 2023 08:19
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save answerquest/7e77a87a6c2a0654a8eea7da0dfdede4 to your computer and use it in GitHub Desktop.
Save answerquest/7e77a87a6c2a0654a8eea7da0dfdede4 to your computer and use it in GitHub Desktop.
google buildings extract for given area.
MINLAT = 21.0762
MAXLAT = 21.2057
MINLON = 78.9929
MAXLON = 79.1738
DATA_FILE = '3bd_buildings.csv'
OUTPUT_CSV = 'nagpur.csv'
import json, os
def write2file(line):
with open(OUTPUT_CSV, 'a') as file:
file.write(line)
def checkBounds(lat, lon):
global MINLAT, MAXLAT, MINLON, MAXLON
if (MINLAT <= lat <= MAXLAT) and (MINLON <= lon <= MAXLON):
return True
else:
return False
#############
print("Starting script")
counter = 0
foundCounter = 0
with open(DATA_FILE, 'r') as f:
for line in f:
if counter == 0:
# write header to output file and skip
write2file(line)
counter += 1
continue
latlon = line.split(',')[:2]
lat = float(latlon[0])
lon = float(latlon[1])
# Check if the point is inside the polygon
is_inside = checkBounds(lat, lon)
if is_inside:
write2file(line)
foundCounter += 1
counter += 1
if counter % 10**5 == 0:
print(f"Parsed {counter/(10**6)}M lines, found {foundCounter}")
print(f"done, parsed {counter/(10**6)}M rows total, got {foundCounter/(10**3)}K matches")
DATA_FILE = '39f_buildings.csv'
AREA_FILE = 'jalpaiguri.geojson'
OUTPUT_CSV = 'output.csv'
# pip install shapely
import json, os
from shapely.geometry import Point, Polygon
def write2file(line):
with open(OUTPUT_CSV, 'a') as file:
file.write(line)
def fetch_polygon(filename):
with open(filename, 'r') as f:
polygon_data = json.load(f)
polygon_coordinates = polygon_data['features'][0]['geometry']['coordinates'][0]
polygon = Polygon(polygon_coordinates)
return polygon
#############
polygon = fetch_polygon(AREA_FILE)
print("polygon loaded")
counter = 0
foundCounter = 0
with open(DATA_FILE, 'r') as f:
for line in f:
if counter == 0:
# write header to output file and skip
write2file(line)
counter += 1
continue
latlon = line.split(',')[:2]
lat = float(latlon[0])
lon = float(latlon[1])
# Create a Point object
point = Point(lon, lat)
# Check if the point is inside the polygon
is_inside = point.within(polygon)
if is_inside:
write2file(line)
foundCounter += 1
counter += 1
if counter % 10**5 == 0:
print(f"Parsed {counter/(10**6)}M lines, found {foundCounter}")
print(f"done, parsed {counter/(10**6)}M rows total, got {foundCounter/(10**3)}K matches")

Steps for extraction by lat-long bounds

  1. Fetch the .gz from https://sites.research.google/open-buildings/#download
  2. Unzip it
  3. Get your max and min lat-long values. You can use https://geojson.io or any other tool
  4. Populate them in the top lines in the -bounds python script. Take care not to put min in max etc.
  5. Change the filename values as appropriate
  6. Run the -bounds python script
  7. You'll see the output CSV file
  8. Open QGIS
  9. Menu > Layer > Add Layer > Add Delimited Text Layer
  10. Select the file, make sure WKT / Well Known Text option is chosen. Press Add
  11. Now you see your shapes in QGIS map
  12. Right-click layer, Export
  13. Export to Geojson or the shapefile format of your choice

Steps for extraction by polygon/shape

  1. Fetch the .gz from https://sites.research.google/open-buildings/#download
  2. Unzip it
  3. Draw the area you want in https://geojson.io (single shape only, modify the script if you do multi-polygon etc), and save the file as .geojson
  4. pip install shapely
  5. Change the filename values at top
  6. Run the -polygon python script
  7. You'll see the output CSV file
  8. Open QGIS
  9. Menu > Layer > Add Layer > Add Delimited Text Layer
  10. Select the file, make sure WKT / Well Known Text option is chosen. Press Add
  11. Now you see your shapes in QGIS map
  12. Right-click layer, Export
  13. Export to Geojson or the shapefile format of your choice

Details

  • This takes the centroid lat-long which are the first two columns in the source CSV data, and does a lat-long comparison (for bounds) or point-in-polygon operation (for polygon) on every line. Dumps the matching ones into output file
  • The scripts loads and processes only one line at a time, so should work on the very large CSV files on regular / entry-level laptops etc also. Let me know if that's not happening for you.

Clarifications

  • All mentioned files must be located in the same folder and this script also in that same folder. Else put in folder paths as needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment