Skip to content

Instantly share code, notes, and snippets.

@crowesn
Last active March 5, 2019 15:33
Show Gist options
  • Save crowesn/ed56d1b8ef525949087496e506ca336f to your computer and use it in GitHub Desktop.
Save crowesn/ed56d1b8ef525949087496e506ca336f to your computer and use it in GitHub Desktop.
import xml.etree.ElementTree as ET
tree = ET.parse('thorpe.xml')
root = tree.getroot()
for item in root.iter('DISS_para'):
abstract = (item.text)
myList = [abstract.split('\t')]
''.join(map(str,myList))
print(abstract)
@carohansen
Copy link

Trying to add it to a CSV file, but it's parsing every letter. Not sure what I'm doing wrong.

import xml.etree.ElementTree as ET
import re
import csv

forCSV =[]
forCSV.append(['abstract'])

tree = ET.parse('thorpe.xml')
root = tree.getroot()

combined_abstracts = ""

for item in root.iter("DISS_para"):
combined_abstracts += re.sub('\t', '', item.text)

forCSV.append(combined_abstracts)

with open('outputFile.csv', 'w', newline='') as csvfile:
wr = csv.writer(csvfile, delimiter=',')
wr.writerows(forCSV)

@carohansen
Copy link

sorry, that indentation is all fucked up

@carohansen
Copy link

thanks, I found my mistake! you're a wizard!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment