Skip to content

Instantly share code, notes, and snippets.

@mciantyre
Last active December 18, 2023 02:39
Show Gist options
  • Save mciantyre/32ff2c2d5cd9515c1ee7 to your computer and use it in GitHub Desktop.
Save mciantyre/32ff2c2d5cd9515c1ee7 to your computer and use it in GitHub Desktop.
KML to CSV in Python
"""
A script to take all of the LineString information out of a very large KML file. It formats it into a CSV file so
that you can import the information into the NDB of Google App Engine using the Python standard library. I ran this
script locally to generate the CSV. It processed a ~70 MB KML down to a ~36 MB CSV in about 8 seconds.
The KML had coordinates ordered by
[Lon, Lat, Alt, ' ', Lon, Lat, Alt, ' ',...] (' ' is a space)
The script removes the altitude to put the coordinates in a single CSV row ordered by
[Lat,Lon,Lat,Lon,...]
Dependencies:
- Beutiful Soup 4
- lxml
I found a little bit of help online for using BeautifulSoup to process a KML file. I put this online to serve as
another example. Some things I learned:
- the BeautifulSoup parser *needs* to be 'xml'. I spent too much time debugging why the default one wasn't working, and
it was because the default is an HTML parse, not XML.
tl;dr
KML --> CSV so that GAE can go CSV --> NDB
"""
from bs4 import BeautifulSoup
import csv
def process_coordinate_string(str):
"""
Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
"""
space_splits = str.split(" ")
ret = []
# There was a space in between <coordinates>" "-80.123...... hence the [1:]
for split in space_splits[1:]:
comma_split = split.split(',')
ret.append(comma_split[1]) # lat
ret.append(comma_split[0]) # lng
return ret
def main():
"""
Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
"""
with open('doc.kml', 'r') as f:
s = BeautifulSoup(f, 'xml')
with open('out.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
for coords in s.find_all('coordinates'):
writer.writerow(process_coordinate_string(coords.string))
if __name__ == "__main__":
main()
@ivanskigib
Copy link

I am using the above examples but I only get the first and last coordinate in a csv file. It is as if it is not looping, however since I am getting the first and last coordinate I have to assume that it is reading the coordinates list.

`def process_coordinate_string(str):

# Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row

ret = []
comma_split = str.split(',')
return [comma_split[1], comma_split[0]]

def main():

# Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.

with open('61956195-6202689-a300234067548720_2022-02-23-16-15-48.kml', 'r') as f:
    s = BeautifulSoup(f, 'xml')
    with open('trajectory-6195.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        for coords in s.find_all('coordinates'):
            writer.writerow(process_coordinate_string(coords.string))                

if name == "main":
main()`

I am a relative beginner. Any reason as to why that may be happening?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment