Skip to content

Instantly share code, notes, and snippets.

@vdobrev
Created January 2, 2018 21:46
Show Gist options
  • Save vdobrev/f6f2df38ab6cd2768a2e7778c4964737 to your computer and use it in GitHub Desktop.
Save vdobrev/f6f2df38ab6cd2768a2e7778c4964737 to your computer and use it in GitHub Desktop.
# KPI 1. Domain dynamics
# values: None (1) / Small (1.1) / Medium (1.3) / Large (1.5)
# min/max = 1/1.5
# get the domain root of the result link url
root = URI.parse(url).host # root = URI.join url, '/'
html = Nokogiri::HTML(open(root))
# if not collected before parse html for dates
unless Domain.exists?(root)
found = []
date_formats = [
/\d{4}-\d{2}-\d{2}/, /\d{4}-\d{1}-\d{2}/, /\d{4}-\d{1}-\d{1}/, /\d{4}-\d{2}-\d{1}/, # yyyy-mm-dd..
/\d{2}-\d{2}-\d{4}/, /\d{2}-\d{1}-\d{4}/, /\d{1}-\d{1}-\d{4}/, /\d{1}-\d{2}-\d{4}/, # dd-mm-yyyy..
/\d{4}\.\d{2}\.\d{2}/, /\d{4}\.\d{2}\.\d{1}/, # yyyy.mm.dd
/\d{2}\.\d{2}\.\d{4}/, /\d{1}\.\d{2}\.\d{4}/, # dd.mm.yyyy
/\d{4}\/\d{2}\/\d{2}/, /\d{4}\/\d{2}\/\d{1}/, # yyyy/mm/dd
/\d{2}\/\d{2}\/\d{4}/, /\d{1}\/\d{2}\/\d{4}/ # dd/mm/yyyy
]
date_formats.each_with_index { |v,k| found[k] = html.scan(v).size }
# if any found, check how recent
else
# if checked recently (1w) return the latest value
# or get % change from last time .. https://github.com/postmodern/nokogiri-diff
end
# store result
Domain.new(root,html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment