Last active
April 11, 2018 02:00
-
-
Save gardart/50c4c4bdd3bace67c7c515d3e4794970 to your computer and use it in GitHub Desktop.
Convert icelandic weather html data (all stations) from html table to csv format - http://brunnur.vedur.is/athuganir/athtafla
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Convert icelandic weather html data (all stations) from html table to csv format | |
$ curl "http://brunnur.vedur.is/athuganir/athtafla/2015081210.html" 2>/dev/null | grep -i -e '</\?TABLE\|</\?TD\|</\?TR'| tr -d '\n' | sed 's/<\ /TR[^>]*>/\n/Ig' | sed 's/<\/\?\(TABLE\|TR\)[^>]*>//Ig' | sed 's/^<T[DH][^>]*>\|<\/\?T[DH][^>]*>$//Ig' | sed 's/<\/T[DH][^>]*><T[DH][^>]*>/,/Ig' | sed 's/<[^>] \+>//Ig' | sed 's/^[\ \t]*//g' | sed 's/^[\ \t]*//g' | sed '/^\s*$/d' | sed 's/^/2015081210,/' | |
Output: | |
2015081210,33751,Siglufjarðarvegur_Herkonugil,-99,6.9,6.9,7.9,80,6.7,7.1,10.2,92,-99 | |
2015081210,33643,Stafá,40,9.3,8.9,9.5,38,4.9,4.9,7.1,79,-99 | |
2015081210,32474,Steingrímsfjarðarheiði,440,4.4,3.9,4.5,65,11.5,11.6,14.2,99,-99 | |
2015081210,31950,Stórholt,70,9.9,9.3,9.9,81,6.7,6.7,8.5,82,-99 | |
############################################################ | |
# How it works: | |
# Get the Contents of the URL | |
# curl "http://brunnur.vedur.is/athuganir/athtafla/2015081210.html" 2>/dev/null | |
# Extract HTML Table elements | |
# | grep -i -e '</\?TABLE\|</\?TD\|</\?TR\|</\?TH' | |
# Remove newlines | |
# | tr -d '\n\r' | |
#Replace </TR> with newline | |
# | sed 's/<\/TR[^>]*>/\n/Ig' | |
# Remove TABLE and TR tags | |
# | sed 's/<\/\?\(TABLE\|TR\)[^>]*>//Ig' | |
# Remove ^<TD>, ^<TH>, </TD>$, </TH>$ | |
# | sed 's/^<T[DH][^>]*>\|<\/\?T[DH][^>]*>$//Ig' | |
# Replace </TD><TD> with comma | |
# | sed 's/<\/T[DH][^>]*><T[DH][^>]*>/,/Ig' | |
# Remove any remaining <TD> | |
# | sed 's/<[^>]\+>//Ig' | sed 's/^[\ \t]*//g' | |
# Remove any Whitespace at the beginning of the line | |
# | sed 's/^[\ \t]*//g' | |
# Remove empty lines | |
# | sed '/^\s*$/d' | |
# Add timestamp (YYYYMMDDHH) to the beginning of each line | |
# | sed 's/^/2015081210,/' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment