Skip to content

Instantly share code, notes, and snippets.

@romain9292
Last active November 1, 2019 11:05
Show Gist options
  • Save romain9292/719f1fedfb4027243edd3ca967c67e78 to your computer and use it in GitHub Desktop.
Save romain9292/719f1fedfb4027243edd3ca967c67e78 to your computer and use it in GitHub Desktop.
[Comparer deux dataframes avec R] Trouver les lignes manquants dans deux datasets en utilisant SQLDF #R #datascience #data #compare
library(sqldf)
# Chargement de nos deux datasets
df_1 <- read.csv2('/Users/romain/Downloads/df_1.csv',sep=',')
df_2 <- read.csv2('/Users/romain/Desktop/df_2.csv',sep=',')
# On isole la colonne qu'on souhaite comparer
df_1 <- as.data.frame(df_1$col_name)
df_2 <- as.data.frame(df_2$col_name)
# Avec une notation SQL
df1NotIndf2 <- sqldf('SELECT * FROM df_1 EXCEPT SELECT * FROM df_2')
# En utilisant la notation %in%
df1NotIndf2 <- df_2[!df_2$email %in% df_1$email, ]
# En utilisant la fonction subset
subset(df_1,!(!df_2$email %in% df_1$email))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment