Skip to content

Instantly share code, notes, and snippets.

@rascoop
Forked from mciantyre/KmlToCsv.py
Created January 3, 2019 03:41
Show Gist options
  • Save rascoop/03f139c5802b0d80debbb72e26f6149e to your computer and use it in GitHub Desktop.
Save rascoop/03f139c5802b0d80debbb72e26f6149e to your computer and use it in GitHub Desktop.
KML to CSV in Python
"""
A script to take all of the LineString information out of a very large KML file. It formats it into a CSV file so
that you can import the information into the NDB of Google App Engine using the Python standard library. I ran this
script locally to generate the CSV. It processed a ~70 MB KML down to a ~36 MB CSV in about 8 seconds.
The KML had coordinates ordered by
[Lon, Lat, Alt, ' ', Lon, Lat, Alt, ' ',...] (' ' is a space)
The script removes the altitude to put the coordinates in a single CSV row ordered by
[Lat,Lon,Lat,Lon,...]
Dependencies:
- Beutiful Soup 4
- lxml
I found a little bit of help online for using BeautifulSoup to process a KML file. I put this online to serve as
another example. Some things I learned:
- the BeautifulSoup parser *needs* to be 'xml'. I spent too much time debugging why the default one wasn't working, and
it was because the default is an HTML parse, not XML.
tl;dr
KML --> CSV so that GAE can go CSV --> NDB
"""
from bs4 import BeautifulSoup
import csv
def process_coordinate_string(str):
"""
Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
"""
space_splits = str.split(" ")
ret = []
# There was a space in between <coordinates>" "-80.123...... hence the [1:]
for split in space_splits[1:]:
comma_split = split.split(',')
ret.append(comma_split[1]) # lat
ret.append(comma_split[0]) # lng
return ret
def main():
"""
Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
"""
with open('doc.kml', 'r') as f:
s = BeautifulSoup(f, 'xml')
with open('out.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
for coords in s.find_all('coordinates'):
writer.writerow(process_coordinate_string(coords.string))
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment