Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save brunaw/1f5ae69643f60424b858da5cf9f31caa to your computer and use it in GitHub Desktop.
Save brunaw/1f5ae69643f60424b858da5cf9f31caa to your computer and use it in GitHub Desktop.
library(tidyverse)
library(RecordLinkage)
df <- data.frame(name = c("Agent Peggy Carter", "Peggy Carter", "Outro Nome"),
sum = 1:3) %>%
mutate_if(is.factor, as.character)
# Comparando sempre o nome anterior
df %>%
mutate(dist = levenshteinSim(name, lag(name, default = "")),
final_name = ifelse(dist > 0.5, lag(name), name)) %>%
group_by(final_name) %>%
summarise(sum_final = sum(sum))
# OU, de forma mais generalizada (comparando todos com todos)
dists <- df$name %>%
purrr::map_dfr(
~{data.frame(dist = levenshteinSim(.x, str1 = df$name),
name = .x) }
)
dists$or_name <- df$name
dists$sum <- df$sum
dists %>%
group_by(dist) %>%
slice(1) %>%
mutate(final_name = ifelse(dist > 0.5, name, or_name)) %>%
group_by(final_name) %>%
summarise(sum_final = sum(sum))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment