Skip to content

Instantly share code, notes, and snippets.

@har07
Last active April 26, 2016 13:18
Show Gist options
  • Save har07/c693eac57c79c2896881f9b6e2de2202 to your computer and use it in GitHub Desktop.
Save har07/c693eac57c79c2896881f9b6e2de2202 to your computer and use it in GitHub Desktop.
lxml.html get table header elements
from lxml import html
raw = '''<table class="list">
<tr>
<th>Date(s)</th>
<th>Sport</th>
<th>Event</th>
<th>Location</th>
</tr>
<tr>
<td>Jan 18-31</td>
<td>Tennis</td>
<td><a href="tennis-grand-slam/australian-open/index.htm">Australia Open</a></td>
<td>Melbourne, Australia</td>
</tr>
</table>'''
table = html.fromstring(raw)
rows = iter(table)
headers = [col.text for col in next(rows)]
print [col.text for col in next(rows)]
# output :
# ['Date(s)', 'Sport', 'Event', 'Location']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment