Created
February 16, 2016 07:00
-
-
Save Bemmu/da5c79e9027bd3e248b7 to your computer and use it in GitHub Desktop.
BigQuery for most popular countries on Hacker News
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT | |
b.country, | |
SUM(score) as total_score, | |
SUM(a.c) as mention_count, | |
ROUND(SUM(score) / SUM(a.c), 2) as avg_score | |
FROM | |
( | |
SELECT | |
a.word, | |
SUM(a.score) AS score, | |
COUNT(1) AS c | |
FROM ( | |
SELECT | |
SPLIT(REGEXP_REPLACE(title, '[^a-zA-Z ]', ''), ' ') AS word, | |
score | |
FROM | |
[fh-bigquery:hackernews.stories]) a | |
GROUP BY | |
a.word | |
) a | |
JOIN | |
[countries.country] b | |
ON | |
a.word = b.aka | |
GROUP BY b.country | |
HAVING mention_count >= 100 | |
ORDER BY avg_score DESC | |
LIMIT 100 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Your query doesn't pick up any countries with multi-word names (e.g. South Africa, North/South Korea) except for the ones that have single word aliases (e.g. USA, UAE, Saudi).
According to your results South Africa and the Koreas have never been mentioned on HN or Reddit which is obviously not true.