Skip to content

Instantly share code, notes, and snippets.

@alpercalisir
Created November 24, 2018 13:58
Show Gist options
  • Save alpercalisir/568190a5e55a79e08be318c285688457 to your computer and use it in GitHub Desktop.
Save alpercalisir/568190a5e55a79e08be318c285688457 to your computer and use it in GitHub Desktop.
Creates PASCAL VOC formatted XML given a csv file
import pandas as pd
import numpy as np
from lxml import etree
import xmlAnnotation.etree.cElementTree as ET
fields = ['NAME_ID', 'XMIN', 'YMIN', 'W', 'H', 'XMAX', 'YMAX']
df = pd.read_csv('loose_bb_test.csv', usecols=fields)
# Change the name of the file.
# This will replace the / with -
def nameChange(x):
x = x.replace("/", "-")
return x
df['NAME_ID'] = df['NAME_ID'].apply(nameChange)
for i in range(0, 2):
height = df['H'].iloc[i]
width = df['W'].iloc[i]
depth = 3
annotation = ET.Element('annotation')
ET.SubElement(annotation, 'folder').text = 'images'
ET.SubElement(annotation, 'filename').text = str(df['NAME_ID'].iloc[i])
ET.SubElement(annotation, 'segmented').text = '0'
size = ET.SubElement(annotation, 'size')
ET.SubElement(size, 'width').text = str(width)
ET.SubElement(size, 'height').text = str(height)
ET.SubElement(size, 'depth').text = str(depth)
ob = ET.SubElement(annotation, 'object')
ET.SubElement(ob, 'name').text = 'face'
ET.SubElement(ob, 'pose').text = 'Unspecified'
ET.SubElement(ob, 'truncated').text = '0'
ET.SubElement(ob, 'difficult').text = '0'
bbox = ET.SubElement(ob, 'bndbox')
ET.SubElement(bbox, 'xmin').text = str(df['XMIN'].iloc[i])
ET.SubElement(bbox, 'ymin').text = str(df['YMIN'].iloc[i])
ET.SubElement(bbox, 'xmax').text = str(df['XMAX'].iloc[i])
ET.SubElement(bbox, 'ymax').text = str(df['YMAX'].iloc[i])
fileName = str(df['NAME_ID'].iloc[i])
tree = ET.ElementTree(annotation)
tree.write(fileName + ".xml", encoding='utf8')
@HariniNarasimhan
Copy link

Hi,
Does this include for the annotation of multiple bounding boxes in a single image?
If so, let me know them clearly please!

@kunaljain0
Copy link

Hi,
Does this include for the annotation of multiple bounding boxes in a single image?
If so, let me know them clearly please!

This won't work for multiple objects in a single image. You will have to save all the indexes to a list or dict and loop over them.

@Laudarisd
Copy link

Thanks for the nice script.

I tried to add some code for multiple images and classes through loop,

If someone is searching for this script, can be found here:

https://github.com/Laudarisd/Data_preprocessing/blob/main/csv_voc_xml.py

@NickosKal
Copy link

@Laudarisd Hey does your script works cause I would like to take a look at it make it public so anyone can see it

@Laudarisd
Copy link

Oh, sorry, link didn't work there.
Yes code works perfectly.
It does 2 tasks:

  1. Converts csv to voc format ( i.e xml) with all classes and separates xmls based on file names.
  2. Converts only desire classes to xml format. For this need to uncomment if and else condition in the script.
    Here is the link:
    https://github.com/Laudarisd/Data_preprocessing/blob/860d76b0e12576aaf0333ba215bf6f9d628201e5/csv_to_voc_xml.py

@NickosKal
Copy link

NickosKal commented May 31, 2021 via email

@Laudarisd
Copy link

You are welcome.

@Kannan665
Copy link

Oh, sorry, link didn't work there.
Yes code works perfectly.
It does 2 tasks:

  1. Converts csv to voc format ( i.e xml) with all classes and separates xmls based on file names.
  2. Converts only desire classes to xml format. For this need to uncomment if and else condition in the script.
    Here is the link:
    https://github.com/Laudarisd/Data_preprocessing/blob/860d76b0e12576aaf0333ba215bf6f9d628201e5/csv_to_voc_xml.py

Have a question.....I am using "Apeer Micro" for annotations, as it is user friendly and packed with lot of features like Bounding boxes, Polylines, Polygons primarily used for microscopy, for segmentation.....The output of Apeer Micro are in "json" and "csv" formats...I intend to use Apeer Micro for annotations, as I have Object detection projects where I am using Tensorflow 2 Object Detection API, which requires annotations in Pascal/VOC XML format to create TF records....I will share the "json" files/"CSV" files and it will be helpful, if you check the feasibility of converting them using your script to Pascal VOC format

@Laudarisd
Copy link

@Kannan665

Have a question.....I am using "Apeer Micro" for annotations, as it is user friendly and packed with lot of features like Bounding boxes, Polylines, Polygons primarily used for microscopy, for segmentation.....The output of Apeer Micro are in "json" and "csv" formats...I intend to use Apeer Micro for annotations, as I have Object detection projects where I am using Tensorflow 2 Object Detection API, which requires annotations in Pascal/VOC XML format to create TF records....I will share the "json" files/"CSV" files and it will be helpful, if you check the feasibility of converting them using your script to Pascal VOC format

I will try if I can. Just let me know your json file format. Thanks.

@josecarloslacayo
Copy link

Thanks for sharing Laudarisd, your code for multiple boxes was very useful to solve my problem also :-)

https://github.com/Laudarisd/Data_preprocessing/blob/860d76b0e12576aaf0333ba215bf6f9d628201e5/csv_to_voc_xml.py

@Laudarisd
Copy link

@josecarloslacayo You are welcome, Happy to hear that.

@Kannan665
Copy link

@Kannan665

Have a question.....I am using "Apeer Micro" for annotations, as it is user friendly and packed with lot of features like Bounding boxes, Polylines, Polygons primarily used for microscopy, for segmentation.....The output of Apeer Micro are in "json" and "csv" formats...I intend to use Apeer Micro for annotations, as I have Object detection projects where I am using Tensorflow 2 Object Detection API, which requires annotations in Pascal/VOC XML format to create TF records....I will share the "json" files/"CSV" files and it will be helpful, if you check the feasibility of converting them using your script to Pascal VOC format

I will try if I can. Just let me know your json file format. Thanks.

The ".json" file when I use Apeer-Micro for labelling is given below........
[
[
"Scope Name",
"ID",
"Annotation Name",
"Class",
"X Min",
"X Max",
"Y Min",
"Y Max",
"Z Min",
"Z Max",
"T Min",
"T Max",
"Type",
"Color (RGB Hex)",
"Area",
"Area Unit",
"Circumference",
"Circumference Unit",
"Length",
"Length Unit",
"Count",
"Geometry",
"Rotation"
],
[
"1.jpg",
373466,
"bad-001",
"bad",
1072,
1102,
106,
119,
1,
1,
1,
1,
"Rectangle",
"#FFFFFF",
48.5,
"mm²",
30.3,
"mm",
null,
null,
null,
[
[
1071,
105
],
[
1101,
118
]
],
null
]
]

A typical ".xml" file after annotation using "LabelImg" for Tensorflow OD in "Pascal-VOC" format is given below.,

-

Dent

image_1_00.jpg

E:\Plunger_Dataset_Unet_Semantic_Segmentation\train\Dent\image_1_00.jpg

-

Unknown

-

520

240

1

0

-

dent

Unspecified

0

0

-

163

74

184

116

I am unable to attach the files as ".json" and ".xml" files......Please let me know, if the details of the .json and .xml file is enough for you.....

@Laudarisd
Copy link

@Kannan665 it would be easier if I get your jason file with data and xml format.
To convert json to pascal voc xml format first need to parse json file and extract only required information.

You can send me an email I guess(wintermouse2020@gmail.com).

I usually convert json to csv with required information and convert to xml. If it is going to be helpful then you can check this json to csv .

https://github.com/Laudarisd/Data_preprocessing/blob/41772aed3374202042f75f87237af8aaaee0a4e5/many_json_to_csv.py

@imadgohar
Copy link

imadgohar commented Aug 10, 2022

@kunaljain0 @Laudarisd
I am trying to convert my csv file to xml. I changed the fields according to my csv file = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']. I am using colab but unable to import this "import xmlAnnotation.etree.cElementTree as ET" and when I googled it I found this "A file named xml.py or a package named xml in the current directory shadows the standard library package with the same name." but I dont have this type of file in my directory.

What is the issue? Can you please guide me. Thanks

@Laudarisd
Copy link

Laudarisd commented Aug 10, 2022

Hi @kunaljain0 , are you trying to convert your csv to xml? If so, in colab, import shouldn't be an issue.

@imadgohar
Copy link

@Laudarisd,
Your code worked for me. Thanks

@Laudarisd
Copy link

@imadgohar happy to hear that.

@imadgohar
Copy link

imadgohar commented Oct 11, 2022 via email

@Laudarisd
Copy link

Hi, recently received few emails regarding conversion. If you are still facing problem in csv to xml conversion, please raise an issue here. I would be happy to help.
https://github.com/Laudarisd/Data_preprocessing/blob/860d76b0e12576aaf0333ba215bf6f9d628201e5/csv_to_voc_xml.py
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment