Here we find world population by timezone (from an extremely reliable source: an online quiz).
First, we can try using rvest
to grab the results. But there's a problem, we need to actually play the quiz in order for the UTC offsets to appear on page.
i.e.
library(tidyverse)
library(rvest)
page <- read_html("https://www.sporcle.com/games/segacs/time-zones-in-order-of-population")
df <- page %>%
html_nodes("table") %>%
.[2] %>%
html_table %>%
.[[1]]
df %>%
head
# Population (est.) Time Zone (UTC offset) Notable Includes
# 1 1,578,332,733 NA China, Malaysia, Central Indonesia, Philippines, Western Australia
# 2 1,313,334,597 NA India, Sri Lanka
# 3 779,333,045 NA Central Europe, Western Africa
# 4 585,168,798 NA European Russia, Arabia, East Africa
# 5 445,080,472 NA Eastern Europe, Middle East, Central and Southern Africa
# 6 398,538,888 NA Southeast Asia, West Indonesia
But we can manually visit the page, start the quiz, select 'give up', and that will load the answers (UTC offsets). Now in the browser select File -> Save page as. I saved it as in my Downloads folder as tz.html
updated_html <- read_html("Downloads/tz.html")
updated_df <- updated_html %>%
html_nodes("table") %>%
.[2] %>%
html_table %>%
.[[1]]
updated_df %>% head
# Population (est.) Time Zone (UTC offset) Notable Includes
# 1 1,578,332,733 UTC +08:00 China, Malaysia, Central Indonesia, Philippines, Western Australia
# 2 1,313,334,597 UTC +05:30 India, Sri Lanka
# 3 779,333,045 UTC +01:00 Central Europe, Western Africa
# 4 585,168,798 UTC +03:00 European Russia, Arabia, East Africa
# 5 445,080,472 UTC +02:00 Eastern Europe, Middle East, Central and Southern Africa
# 6 398,538,888 UTC +07:00 Southeast Asia, West Indonesia
(out <- updated_df %>%
rename(
utc_offset = 'Time Zone (UTC offset)',
population = 'Population (est.)',
includes = 'Notable Includes') %>%
mutate(
offset_minutes = substr(utc_offset, nchar(utc_offset)-1, nchar(utc_offset)),
utc_offset_is_an_hour_multiple_of_one = ifelse(offset_minutes == "00", TRUE, FALSE),
population = str_remove_all(population, ",") %>% as.numeric
) %>%
group_by(utc_offset_is_an_hour_multiple_of_one) %>%
summarise(total_pop = sum(population)))
# A tibble: 2 x 2
utc_offset_is_an_hour_multiple_of_one total_pop
<lgl> <dbl>
1 FALSE 1537594167
2 TRUE 5782896345
1537594167 / (1537594167 + 5782896345)
# [1] 0.2100398
So the answer is 21% of the world's population lives in a time zone whose UTC offset (in hours) isn't a multiple of 1.
Here's an easy way to access a cleaned up data.frame in R in the future. Simply copy the following code into R:
df <- structure(list(population = c(1578332733, 1313334597, 779333045,
585168798, 445080472, 398538888, 301108339, 288163120, 252535680,
241618458, 240604308, 208231089, 187360619, 99051407, 79620200,
61526901, 54364022, 40307463, 32872730, 32564342, 31400559, 30986975,
24213510, 6157399, 1971761, 1919028, 1713232, 773395, 731910,
528336, 299586, 55593, 9264, 8809, 2784, 600, 360, 200, 0), utc_offset = c("UTC +08:00",
"UTC +05:30", "UTC +01:00", "UTC +03:00", "UTC +02:00", "UTC +07:00",
"UTC -05:00", "UTC +05:00", "UTC ±00:00", "UTC -03:00", "UTC -06:00",
"UTC +06:00", "UTC +09:00", "UTC -04:00", "UTC +03:30", "UTC -08:00",
"UTC +06:30", "UTC +04:00", "UTC -07:00", "UTC +04:30", "UTC +10:00",
"UTC +05:45", "UTC +08:30", "UTC +12:00", "UTC +09:30", "UTC +11:00",
"UTC -10:00", "UTC -01:00", "UTC -09:00", "UTC -03:30", "UTC +13:00",
"UTC -11:00", "UTC -09:30", "UTC +14:00", "UTC -02:00", "UTC +12:45",
"UTC +10:30", "UTC +08:45", "UTC -12:00"), includes = c("China, Malaysia, Central Indonesia, Philippines, Western Australia",
"India, Sri Lanka", "Central Europe, Western Africa", "European Russia, Arabia, East Africa",
"Eastern Europe, Middle East, Central and Southern Africa", "Southeast Asia, West Indonesia",
"Eastern Time (US/Can), Cuba, Haiti, South America", "Pakistan, Central Asia, Maldives",
"Coordinated Universal Time - UK, West Africa", "Argentina, Brazil, Falklands, Uruguay, Greenland",
"Central Time (US/Can/Mex), Central America, Galapagos, Easter Island",
"Bangladesh, Bhutan, Omsk, East Kazakhstan, Xinjiang (unofficial)",
"Japan, South Korea", "Atlantic Time (Can), Caribbean, Venezuela, Bolivia, Guyana, Amazonas (Brazil)",
"Iran", "Pacific Time (US/Can), Pitcairn Islands", "Myanmar, Cocos Islands",
"Samara, Caucasus, Gulf", "Mountain Time (US/Can/Mex)", "Afghanistan",
"Eastern Australia, Papua New Guinea, Primorsky", "Nepal", "North Korea",
"New Zealand, Fiji", "Central Australia (NT, SA)", "New Caledonia, Solomon Islands, Vanuatu",
"French Polynesia, Hawaii, Cook Islands", "Cabo Verde, Azores, Eastern Greenland",
"Alaska, Gambier Islands", "Newfoundland, Southeastern Labrador",
"Samoa, Tonga", "American Samoa, Niue", "Marquesas Islands (French Polynesia)",
"Line Islands (Kiribati)", "Fernando de Noronha, South Georgia and South Sandwich Islands",
"Chatham Islands (New Zealand)", "Lord Howe Island (Australia)",
"Eucla (Australia) - unofficial", "Howland Island, Baker Island (USA) - uninhabited"
)), class = "data.frame", row.names = c(NA, -39L))
df %>% head
# population utc_offset includes
# 1 1578332733 UTC +08:00 China, Malaysia, Central Indonesia, Philippines, Western Australia
# 2 1313334597 UTC +05:30 India, Sri Lanka
# 3 779333045 UTC +01:00 Central Europe, Western Africa
# 4 585168798 UTC +03:00 European Russia, Arabia, East Africa
# 5 445080472 UTC +02:00 Eastern Europe, Middle East, Central and Southern Africa
# 6 398538888 UTC +07:00 Southeast Asia, West Indonesia