Skip to content

Instantly share code, notes, and snippets.

@pulsejet
Created May 31, 2018 10:49
Show Gist options
  • Save pulsejet/646e16a51e6ae49cc4ad3aa60f3d2c2b to your computer and use it in GitHub Desktop.
Save pulsejet/646e16a51e6ae49cc4ad3aa60f3d2c2b to your computer and use it in GitHub Desktop.
Separate xml files removing tags (CRUDE)
import xml.etree.ElementTree as ET
tree = ET.parse('xmlsdev.xml')
root = tree.getroot()
with open('corrected', 'w', encoding='utf-8') as file:
i = 0
for sentence in root.iter('sentence'):
for child in sentence.findall('del'):
tail = child.tail
child.clear()
child.tail = tail
file.write("".join(sentence.itertext()))
file.write('\n')
if i%100 == 0:
print(i)
i += 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment