Skip to content

Instantly share code, notes, and snippets.

@nuria
Last active September 21, 2015 22:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nuria/702e8c3d6246dedca431 to your computer and use it in GitHub Desktop.
Save nuria/702e8c3d6246dedca431 to your computer and use it in GitHub Desktop.
-- geocoded data on webrequest is like:
-- {"city":"Unknown","country_code":"--","longitude":"-1","postal_code":"Unknown","timezone":"Unknown","subdivision":"Unknown","continent":"Unknown","latitude":"-1","country":"Unknown"}
-- find records where by city we have less than 10 unique IPs
use wmf;
select wr1.client_ip, geocoded_data["city"] from webrequest as wr1 where year=2015 and month=09 and hour=01
and wr1.client_ip in (select wr2.client_ip from webrequest wr2 where year=2015 and month=09 and hour=01 group by wr2.client_ip having count(*) <10);
-- cities with less than 10 ips
select geocoded_data['country'], geocoded_data['city'],
count(distinct client_ip) as cnt,
collect_set(client_ip) as unique_ips_for_this_city
from webrequest
where year=2015 and month=09 and hour=01
and geocoded_data['country'] not in ('Unknown', '--', '-')
and geocoded_data['city'] not in ('Unknown', '--', '-')
group by geocoded_data['country'], geocoded_data['city']
having count(distinct client_ip) < 10
;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment