Skip to content

Instantly share code, notes, and snippets.

@soobrosa
Last active December 18, 2015 07:38
Show Gist options
  • Save soobrosa/5747648 to your computer and use it in GitHub Desktop.
Save soobrosa/5747648 to your computer and use it in GitHub Desktop.
light ETL to reformat webdata from http://www.hydroinfo.hu/Html/archivum/archiv_tabla.html to a TSV with date and value columns
fi = open ('vizallas.txt', 'r')
fo = open ('vizallas.tsv', 'w')
year = ''
for li in fi:
# fixup days not existing in a given month
it = li.strip().replace(' ',' ... ').split(' ')
if len(it) < 2:
# not a data line
if it[0].strip().isdigit():
if int(it[0]) > 2000:
# catched a year
year = it[0].strip()
else:
# data line with all month for a day
day = '0' + it[0].strip()
day = day[-2:]
for mo in range(1,13)::
if it[mo].strip() <> '...':
month = '0'+str(mo)
month = month[-2:]
print >>fo, year + '-' + month + '-' + day + '\t' + it[mo].strip()
fo.close()
fi.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment