Skip to content

Instantly share code, notes, and snippets.

@vlas-ilya
Created September 6, 2016 17:16
Show Gist options
  • Save vlas-ilya/5cc2cbef70ade7ca22112974740d4676 to your computer and use it in GitHub Desktop.
Save vlas-ilya/5cc2cbef70ade7ca22112974740d4676 to your computer and use it in GitHub Desktop.
Группировка по факторным переменным и нахождение выбросов в каждой группе
is_outlier = lambda group: (np.abs(group - group.mean()) > 2 * group.std()).astype(int)
factors = df.select_dtypes(exclude=[np.number]).columns.tolist()
df['is_outlier'] = df.groupby(factors).transform(is_outlier)
tr1 <- split(test[, sapply(test, is.numeric)], lapply(test[, sapply(test, is.factor)], factor))
tr2 <- lapply(tr1, function(x) ifelse(abs(x - mean(x)) < 2 * sd(x), 0, 1))
test$is_outlier <- unsplit(tr2, lapply(test[,sapply(test, is.factor)], factor))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment