Skip to content

Instantly share code, notes, and snippets.

@mciantyre
Last active December 18, 2023 02:39
Show Gist options
  • Save mciantyre/32ff2c2d5cd9515c1ee7 to your computer and use it in GitHub Desktop.
Save mciantyre/32ff2c2d5cd9515c1ee7 to your computer and use it in GitHub Desktop.
KML to CSV in Python
"""
A script to take all of the LineString information out of a very large KML file. It formats it into a CSV file so
that you can import the information into the NDB of Google App Engine using the Python standard library. I ran this
script locally to generate the CSV. It processed a ~70 MB KML down to a ~36 MB CSV in about 8 seconds.
The KML had coordinates ordered by
[Lon, Lat, Alt, ' ', Lon, Lat, Alt, ' ',...] (' ' is a space)
The script removes the altitude to put the coordinates in a single CSV row ordered by
[Lat,Lon,Lat,Lon,...]
Dependencies:
- Beutiful Soup 4
- lxml
I found a little bit of help online for using BeautifulSoup to process a KML file. I put this online to serve as
another example. Some things I learned:
- the BeautifulSoup parser *needs* to be 'xml'. I spent too much time debugging why the default one wasn't working, and
it was because the default is an HTML parse, not XML.
tl;dr
KML --> CSV so that GAE can go CSV --> NDB
"""
from bs4 import BeautifulSoup
import csv
def process_coordinate_string(str):
"""
Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
"""
space_splits = str.split(" ")
ret = []
# There was a space in between <coordinates>" "-80.123...... hence the [1:]
for split in space_splits[1:]:
comma_split = split.split(',')
ret.append(comma_split[1]) # lat
ret.append(comma_split[0]) # lng
return ret
def main():
"""
Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
"""
with open('doc.kml', 'r') as f:
s = BeautifulSoup(f, 'xml')
with open('out.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
for coords in s.find_all('coordinates'):
writer.writerow(process_coordinate_string(coords.string))
if __name__ == "__main__":
main()
@shantanu848
Copy link

Refer to my previous comment, while opening the file change 'wb' to 'w'. When using 'wb' you are telling him to write in binary mode.

Thank you, helped me complete that task.

@WxBDM
Copy link

WxBDM commented Jul 6, 2021

This helped me, thanks! I needed something slightly more pandas-friendly, so I slightly edited it. I'm sharing it in this thread in case someone else needs it (Python 3.x):

def process_coordinate_string(str):
    """
    Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
    """
    space_splits = str.split(" ")
    ret = []
    # There was a space in between <coordinates>" "-80.123...... hence the [1:]
    for split in space_splits[1:]:
        comma_split = split.split(',')
        ret.append(comma_split[1])    # lat
        ret.append(comma_split[0])    # lng
    return ret

def main():
    """
    Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
    """
    with open('input.kml', 'r') as f:
        s = BeautifulSoup(f, 'xml')
        
    for coords in s.find_all('coordinates'):
        data = process_coordinate_string(coords.string)

    lats = [float(x) for index, x in enumerate(data) if index % 2 == 0]
    lons = [float(x) for index, x in enumerate(data) if index % 2 == 1]

    df = pd.DataFrame({'Lat' : lats, 'Lon' : lons})
    df.to_csv("kml_to_df.csv", index = False)

@josmarcristello
Copy link

josmarcristello commented Nov 12, 2021

Slight modification on WxBDM, as I had some issues with lack of standardization on the kml file generated. Also, imports from a kml folder and exports to a csv folder with the same shared filename, to allow for mass conversion. Function is now called with the kml filename as an argument.

kml2csv('test.kml')
from bs4 import BeautifulSoup
import csv

def process_coordinate_string(str):
    """
    Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
    """
    space_splits = str.split(" ")
    ret = []
    # There was a space in between <coordinates>" "-80.123...... hence the [1:]
    for split in space_splits[1:]:
        comma_split = split.split(',')
        # Checks for len on the split, because depending on kml file generator you might get an empty 
        # string (which would be misinterpreted as a coordinate)
        if(len(split.split(',')) == 3):
            ret.append(comma_split[1])  # lat
            ret.append(comma_split[0])  # lng
    return ret

def kml2csv(fname):
    """
    Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
    Input: Filename with extension ('example.kml'), located in 'kml' folder.
    Output: File with the same name as input, but in .csv format, located in 'csv' folder.
    """
    out_fname = fname.split('.kml')[0] + '.csv'
    with open('kml/'+fname, 'r') as f:
        s = BeautifulSoup(f, 'xml')
        
    for coords in s.find_all('coordinates'):
        data = process_coordinate_string(coords.string)

    lats = [float(x) for index, x in enumerate(data) if index % 2 == 0]
    lons = [float(x) for index, x in enumerate(data) if index % 2 == 1]    
    df = pd.DataFrame({'Lat' : lats, 'Lon' : lons})
    
    
    df.to_csv("csv/"+out_fname, index = False)   

@ivanskigib
Copy link

I am using the above examples but I only get the first and last coordinate in a csv file. It is as if it is not looping, however since I am getting the first and last coordinate I have to assume that it is reading the coordinates list.

`def process_coordinate_string(str):

# Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row

ret = []
comma_split = str.split(',')
return [comma_split[1], comma_split[0]]

def main():

# Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.

with open('61956195-6202689-a300234067548720_2022-02-23-16-15-48.kml', 'r') as f:
    s = BeautifulSoup(f, 'xml')
    with open('trajectory-6195.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        for coords in s.find_all('coordinates'):
            writer.writerow(process_coordinate_string(coords.string))                

if name == "main":
main()`

I am a relative beginner. Any reason as to why that may be happening?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment