Skip to content

Instantly share code, notes, and snippets.

@eidge
Last active June 27, 2016 12:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eidge/02ea5672e9f5cbe63e5150b70eaa928c to your computer and use it in GitHub Desktop.
Save eidge/02ea5672e9f5cbe63e5150b70eaa928c to your computer and use it in GitHub Desktop.
defmodule Weather.NOAA.DirectoryListingHTMLParser do
@moduledoc """
Implements utilities to parse NOAA's cycle directory at
http://nomads.ncep.noaa.gov/cgi-bin/filter_<model-name>.pl
"""
@doc """
Takes an html input an returns a list of available cycles.
Args:
- html - The HTML input to be parsed, string
Returns a list of cycle maps (%{date: Timex.Date, cycle: integer})
"""
def cycles(html) do
Regex.scan(~r/<a[^<]*>(gfs\.\d*)<\/a>/, html)
|> Enum.map(&parse_date_and_cycle/1)
end
defp parse_date_and_cycle(match) do
{date, cycle} = extract_date_and_cycle(match)
%{date: parse_date(date), cycle: parse_cycle(cycle)}
end
defp extract_date_and_cycle(match) do
match
|> List.last
|> String.replace("gfs.", "")
|> String.split_at(8)
end
defp parse_date(str) do
Timex.parse!(str, "{YYYY}{0M}{D}") |> Timex.to_date
end
defp parse_cycle(str) do
String.to_integer(str)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment