Skip to content

Instantly share code, notes, and snippets.

@rotemtam
Created April 14, 2018 09:49
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save rotemtam/88d9a4efae243fc77ed4a0f9917c8f6c to your computer and use it in GitHub Desktop.
Save rotemtam/88d9a4efae243fc77ed4a0f9917c8f6c to your computer and use it in GitHub Desktop.
pascal voc xml to csv table
import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET
def xml_to_csv(path):
xml_list = []
for xml_file in glob.glob(path + '/*.xml'):
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
bbx = member.find('bndbox')
xmin = int(bbx.find('xmin').text)
ymin = int(bbx.find('ymin').text)
xmax = int(bbx.find('xmax').text)
ymax = int(bbx.find('ymax').text)
label = member.find('name').text
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
label,
xmin,
ymin,
xmax,
ymax
)
xml_list.append(value)
column_name = ['filename', 'width', 'height',
'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
return xml_df
def main():
datasets = ['train', 'dev', 'test']
for ds in datasets:
image_path = os.path.join(os.getcwd(), ds, 'annotations')
xml_df = xml_to_csv(image_path)
xml_df.to_csv('labels_{}.csv'.format(ds), index=None)
print('Successfully converted xml to csv.')
main()
@SasikiranJ
Copy link

SasikiranJ commented Apr 4, 2020

I have multiple bounding boxes in one file. can you please tell how to convert them.

@robisen1
Copy link

robisen1 commented Apr 6, 2020

I have multiple bounding boxes in one file. can you please tell how to convert them.
My friend look at the code for a bit. You will that what it does is just grab the necessary information from the XML file, sorts into sorts each bounding box by into a row by image name. So if like me, you have images with many bounding boxes you will get many rows in the CSV with the same file name, image name usually, but with a different bounding box. Here is an example of one of my XML files. I hope that helps.

filename width height class xmin ymin xmax ymax
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 381 527 453 617
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 903 52 981 163
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 1285 388 1363 518
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 2519 563 2612 656
Scenario_9_3_1_crouching75ft-0002.jpg 3840 2160 pedestrian 381 530 474 629
Scenario_9_3_1_crouching75ft-0002.jpg 3840 2160 pedestrian 912 67 1000 175

@adamfahmi48
Copy link

I have multiple bounding boxes in one file. can you please tell how to convert them.
My friend look at the code for a bit. You will that what it does is just grab the necessary information from the XML file, sorts into sorts each bounding box by into a row by image name. So if like me, you have images with many bounding boxes you will get many rows in the CSV with the same file name, image name usually, but with a different bounding box. Here is an example of one of my XML files. I hope that helps.

filename width height class xmin ymin xmax ymax
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 381 527 453 617
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 903 52 981 163
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 1285 388 1363 518
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 2519 563 2612 656
Scenario_9_3_1_crouching75ft-0002.jpg 3840 2160 pedestrian 381 530 474 629
Scenario_9_3_1_crouching75ft-0002.jpg 3840 2160 pedestrian 912 67 1000 175

@adamfahmi48
Copy link

I failed to run this syntax. how can I run this syntax to be like the results you get? Please help me

@adamfahmi48
Copy link

does this not work on windows?

@robisen1
Copy link

robisen1 commented Jun 6, 2020

I have some working code. If your still interested let me know

@pat-daniel
Copy link

Works like a charm to get the Automl csv format from OpenImages dataset. Thanks

@prateek09101996
Copy link

@pat-daniel. can you post the code or send me the code?

@Ahanmr
Copy link

Ahanmr commented Oct 14, 2020

I have some working code. If your still interested let me know

Yes that would be really helpful.

@megha2503
Copy link

I have some working code. If your still interested let me know

do you still have the code? I want to look into it.

@giorgosrigas
Copy link

Hello. Thank you for this. What changes should I make to the code above in order to extract the images that only contain one single object?

@Panther212
Copy link

Did anyone get the working code?

@Panther212
Copy link

Did anyone get the working code?

It worked for me perfectly, Thank you so much, you have no idea how much time and effort you have saved me :)

@nandini211995
Copy link

I have some working code. If your still interested let me know

Hi , I am facing the same issue that I have XML with multiple bounding box , how to convert into csv file . Could you please share the code with me.

@ThiloFink
Copy link

ThiloFink commented Jul 27, 2021

Thanks!
The code executed without errors but where can I find the csv file or files now?

The easiest way is to use Roboflow!!!
You can also resize images and do other augmentations.
https://roboflow.com/

Copy link

ghost commented Oct 28, 2021

The csv file is converted however the content is not there. The only first row is there with the heading and the rest is empty.

@MuzziMuzzi
Copy link

MuzziMuzzi commented Nov 5, 2021

The csv file is converted however the content is not there. The only first row is there with the heading and the rest is empty.

Did you ever figure this out?

Update: nvm i figured it out. Its because its looking for the xml files in images/train. Change the paths or if you're lazy like me, just create a image/train directory and put it in there.

@wandyu
Copy link

wandyu commented Nov 24, 2021

I have multiple bounding boxes in one file. can you please tell how to convert them.
My friend look at the code for a bit. You will that what it does is just grab the necessary information from the XML file, sorts into sorts each bounding box by into a row by image name. So if like me, you have images with many bounding boxes you will get many rows in the CSV with the same file name, image name usually, but with a different bounding box. Here is an example of one of my XML files. I hope that helps.

filename width height class xmin ymin xmax ymax
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 381 527 453 617
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 903 52 981 163
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 1285 388 1363 518
Scenario_9_3_1_crouching75ft-0001.jpg 3840 2160 pedestrian 2519 563 2612 656
Scenario_9_3_1_crouching75ft-0002.jpg 3840 2160 pedestrian 381 530 474 629
Scenario_9_3_1_crouching75ft-0002.jpg 3840 2160 pedestrian 912 67 1000 175

Yes. I am also facing the same issue. Please share the code.

@Lilkol
Copy link

Lilkol commented Jun 15, 2022

##I'm working on the same code. If you are still interested in guiding me through, help me change from XML to CSV file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment