Skip to content

Instantly share code, notes, and snippets.

@Bemmu
Created February 16, 2016 07:00
Show Gist options
  • Save Bemmu/da5c79e9027bd3e248b7 to your computer and use it in GitHub Desktop.
Save Bemmu/da5c79e9027bd3e248b7 to your computer and use it in GitHub Desktop.
BigQuery for most popular countries on Hacker News
SELECT
b.country,
SUM(score) as total_score,
SUM(a.c) as mention_count,
ROUND(SUM(score) / SUM(a.c), 2) as avg_score
FROM
(
SELECT
a.word,
SUM(a.score) AS score,
COUNT(1) AS c
FROM (
SELECT
SPLIT(REGEXP_REPLACE(title, '[^a-zA-Z ]', ''), ' ') AS word,
score
FROM
[fh-bigquery:hackernews.stories]) a
GROUP BY
a.word
) a
JOIN
[countries.country] b
ON
a.word = b.aka
GROUP BY b.country
HAVING mention_count >= 100
ORDER BY avg_score DESC
LIMIT 100
@JSJvR
Copy link

JSJvR commented Feb 17, 2016

Your query doesn't pick up any countries with multi-word names (e.g. South Africa, North/South Korea) except for the ones that have single word aliases (e.g. USA, UAE, Saudi).

According to your results South Africa and the Koreas have never been mentioned on HN or Reddit which is obviously not true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment