Skip to content

Instantly share code, notes, and snippets.

@attgm
Last active April 3, 2023 08:15
Show Gist options
  • Save attgm/28500074e3d0a89bd5132817b3350b0b to your computer and use it in GitHub Desktop.
Save attgm/28500074e3d0a89bd5132817b3350b0b to your computer and use it in GitHub Desktop.
厚生労働省のコロナ陽性者数の時系列データをクラスタリングして日本地図を色分けする

コロナ陽性者数の時系列データのクラスタリング

厚生労働省のオープンデータにある県別のコロナ陽性者数の推移をK-means法を使ってクラスタリングし、日本地図に図示します。

実際の処理内容としては

  • 1週間の平均値を導出 (l.56 - 58)
  • 期間内の陽性者数が1になるように正規化(l.59)

という処理をしています。

日本地図へのプロットは、RPubsにあったggplot2 で沖縄をずらして日本地図を描きたいのコードを利用しています。

library(tidyverse)
library(ggplot2)
library(lubridate)
library(ggthemes)
# -- Functions for drawing Japan map
# https://rpubs.com/ktgrstsh/775867
library(NipponMap)
library(sf)
map_nipponmap <- read_sf(system.file("shapes/jpn.shp", package = "NipponMap")[1], crs = "WGS84")
shift_okinawa <-
function(data,
col_pref = "name",
pref_value = "Okinawa",
geometry = "geometry",
zoom_rate = 3,
pos = c(4.5, 17.5)) {
row_okinawa <- data[[col_pref]] == pref_value
geo <- data[[geometry]][row_okinawa]
cent <- sf::st_centroid(geo)
geo2 <- (geo - cent) * zoom_rate + cent + pos
data[[geometry]][row_okinawa] <- geo2
return(sf::st_as_sf(data))
}
layer_autoline_okinawa <- function(
x = c(130, 132.5, 136),
xend = c(132.5, 136, 136),
y = c(36, 36, 38),
yend = c(36, 38, 40),
size = ggplot2::.pt / 15
){
ggplot2::annotate("segment",
x = x,
xend = xend,
y = y,
yend = yend,
size = .pt / 15,
colour = 'black'
)
}
# --
# https://www.mhlw.go.jp/stf/covid-19/open-data.html
d <- read_csv("newly_confirmed_cases_daily.csv")
data <- d %>%
pivot_longer(!Date, names_to = "prefecture", values_to = "number") %>%
mutate(date = ymd(Date)) %>%
filter(prefecture != 'ALL') %>%
select(-Date) %>%
mutate(date = floor_date(date, "week")) %>%
group_by(prefecture, date) %>%
summarise(number = mean(number), .groups = "drop_last") %>%
mutate(normalized = number / sum(number)) %>%
ungroup() %>%
select(-number) %>%
pivot_wider(names_from = "date", values_from = "normalized", values_fill = 0) %>%
column_to_rownames("prefecture")
# Number of clusters : Need to derive by Elbow method
CLUSTER_NUM <- 7
km <- kmeans(data, CLUSTER_NUM)
cls <- data.frame(name=names(km$cluster), number=km$cluster)
map <- map_nipponmap %>% inner_join(cls)
g <- ggplot(shift_okinawa(map, zoom_rate=1, pos=c(4.5, 11.5))) +
layer_autoline_okinawa() +
geom_sf(aes(fill = factor(number)), size = .pt / 10) +
theme_void() +
theme(legend.position = 'none')
ggsave(file="covid19_map.pdf", plot=g, width=8, height=8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment