Skip to content

Instantly share code, notes, and snippets.

@mike-seekwell
Last active June 18, 2020 14:14
Show Gist options
  • Save mike-seekwell/83ac75c82a943e287a7abe6b73b73f02 to your computer and use it in GitHub Desktop.
Save mike-seekwell/83ac75c82a943e287a7abe6b73b73f02 to your computer and use it in GitHub Desktop.
SELECT
httpRequest.remoteIp,
httpRequest.referer,
FORMAT_TIMESTAMP('%Y-%m-%d', timestamp) as visited_on,
httpRequest.requestUrl,
count(1) as ct
FROM
`logs.appengine_googleapis_com_*`
WHERE 1=1
AND httpRequest.referer not like '%.css%'
AND httpRequest.referer not like '%.ru/%'
AND httpRequest.requestUrl != '/robots.txt'
AND httpRequest.requestUrl not like '%/sitemap%'
AND httpRequest.requestUrl not like '%favicon%'
-- a bunch more specific to my app, e.g. app.seekwell.io
AND httpRequest.requestUrl not like '/wp-login%'
AND httpRequest.requestUrl not like '%login.php%'
AND httpRequest.requestUrl not like '%.php.%'
AND httpRequest.requestUrl not like '%.well-known%'
AND httpRequest.requestMethod != 'POST'
-- User Agents / Bots
AND httpRequest.userAgent not like '%Googlebot%'
AND httpRequest.userAgent not like '%AhrefsBot%'
AND httpRequest.userAgent not like '%DuckDuckBot%'
AND httpRequest.userAgent not like '%Bingbot%'
AND httpRequest.userAgent not like '%YandexBot%'
AND httpRequest.userAgent not like '%Slurp%'
AND httpRequest.userAgent not like '%Baiduspider%'
AND httpRequest.userAgent not like '%Spider/%'
AND httpRequest.userAgent not like '%Exabot/%'
AND httpRequest.userAgent not like '%Konqueror/%'
AND httpRequest.userAgent not like '%ia_archiver%'
group by 1, 2, 3, 4
ORDER BY 3 DESC
limit 16000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment