Skip to content

Instantly share code, notes, and snippets.

@MilesMcBain
Last active February 4, 2021 08:05
Show Gist options
  • Save MilesMcBain/8441f0a418bb4e8c6df7785f054906dd to your computer and use it in GitHub Desktop.
Save MilesMcBain/8441f0a418bb4e8c6df7785f054906dd to your computer and use it in GitHub Desktop.
rolling_date_merge.R
library(tidyverse)
oms_db <-
tribble( ~id, ~data, ~start, ~end,
1, 1, '2020-01-01', '2020-01-04',
1, 2, '2020-01-04', '3000-01-01',
2, 1, '2020-01-01', '2020-01-04',
2, 2, '2020-01-04', '3000-01-01',
3, 1, '2020-01-01', '3000-01-01')
gis_db <-
tribble( ~id, ~data, ~start, ~end,
1, 1, '2019-12-31', '2020-01-02',
1, 2, '2020-01-02', '2020-01-03',
1, 3, '2020-01-03', '3000-01-01',
3, 1, '2020-10-01', '3000-01-01')
gis_meta <-
gis_db %>%
select(id, start, end) %>%
mutate(source = "gis", gis_change = row_number()) %>%
pivot_longer(cols = c("start", "end"), names_to = "date_type", values_to = "date")
oms_meta <-
oms_db %>%
select(id, start, end) %>%
mutate(source = "oms", oms_change = row_number()) %>%
pivot_longer(cols = c("start", "end"), names_to = "date_type", values_to = "date")
bind_rows(
gis_meta,
oms_meta
) %>%
arrange(id, date) %>%
group_by(id) %>%
fill(gis_change, oms_change, .direction = "down") %>%
ungroup() %>%
select(id, oms_change, gis_change, date_type, date) %>%
pivot_wider(names_from = "date_type", values_from = "date") %>%
mutate(
start = map_chr(start, min),
end = map_chr(end, min)
)
@anthonynorth
Copy link

anthonynorth commented Feb 4, 2021

So, I figured out how to make it clean. The overlap condition is all I needed after all (where I started), but I needed to clean up the start & end dates.

SELECT
    [@StationHistory].Id,
    StationVersion, LocationVersion,
    StartDate = (
        CASE 
            WHEN [@LocationHistory].StartDate > [@StationHistory].StartDate
            -- is not first
            AND Lag(1) OVER (PARTITION BY [@StationHistory].Id ORDER BY [@StationHistory].StartDate) = 1
            THEN [@LocationHistory].StartDate
            ELSE [@StationHistory].StartDate
        END
    ),
    EndDate = (
        CASE
            WHEN [@LocationHistory].EndDate < [@StationHistory].EndDate
            THEN [@LocationHistory].EndDate
            ELSE [@StationHistory].EndDate
        END
    )
FROM
    @StationHistory
    LEFT JOIN @LocationHistory ON [@LocationHistory].Id = [@StationHistory].Id
        AND [@LocationHistory].EndDate > [@StationHistory].StartDate
        AND [@LocationHistory].StartDate < [@StationHistory].EndDate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment