Skip to content

Instantly share code, notes, and snippets.

View samuel-bohman's full-sized avatar

Samuel samuel-bohman

  • Stockholm, Sweden
View GitHub Profile
@samuel-bohman
samuel-bohman / SparkR_vs_sparklyr.R
Created March 5, 2020 17:19
SparkR versus sparklyr
library(SparkR, lib.loc = paste(Sys.getenv("SPARK_HOME"), "/R/lib", sep = ""))
sc <- sparkR.session(master = "local")
df1 <- read.df("nycflights13.csv", source = "csv", header = "true", inferSchema = "true")
### SUMMARY TABLE WITH SQL
createOrReplaceTempView(df1, "tbl1")
summ <- sql("select month, avg(dep_time) as avg_dep, avg(arr_time) as avg_arr from tbl1 where month in (1, 3, 5) group by month")
head(summ)
# month avg_dep avg_arr
# 1 1 1347.210 1523.155
@samuel-bohman
samuel-bohman / user_preferences.json
Last active September 18, 2019 10:32
RStudio IDE Keyboard Shortcuts in JupyterLab
{
"shortcuts": [
{
"command": "application:activate-next-tab",
"keys": [
"Ctrl Shift ]"
],
"selector": "body",
"disabled": true
},
# Compare configurations
h_configs <- dtwclust::compare_clusterings_configs(
types = "hierarchical",
k = 2L:30L,
controls = list(
hierarchical = hierarchical_control(
method = "all"
# distmat = d # Optional precomputed cross-distance matrix
)
),
@samuel-bohman
samuel-bohman / boyer_moore.R
Created January 30, 2018 15:56
Boyer-More Majority Vote Algorithm
# https://www.cs.utexas.edu/~moore/best-ideas/mjrty/index.html
x <- c("A", "A", "A", "C", "C", "B", "B", "C", "C", "C", "B", "C", "C") # 7 C's out of 13
bv <- function(x) {
v <- c()
i <- 0
for (j in 1:length(x)) {
if (i == 0) {
v <- x[j]
i <- 1
@samuel-bohman
samuel-bohman / boyer_moore_generalization.cpp
Last active January 30, 2018 15:57
Boyer–Moore Majority Vote Algorithm Generalization
#include <iostream>
#include <bits/stdc++.h>
using namespace std;
struct Element {
int value;
int count;
};
@samuel-bohman
samuel-bohman / k_means.R
Last active October 5, 2017 15:42
k-means algorithm
k_means <- function(x, k, iter.max = 10) {
random_index <- sample(1:k, nrow(x), replace = TRUE)
data_w_cluster <- cbind(x, clusterID = random_index)
iterations <- 1
plot(data_w_cluster[, 1:2], xaxt = "n", yaxt = "n")
legend("topright", paste0("i = ", 0), bg = NULL)
while(TRUE) {
centroids <- matrix(rep(0, times = k * ncol(x)), nrow = k, ncol = ncol(x))
for(i in 1:k) {
obs_of_cluster_i <- data_w_cluster$clusterID == i
@samuel-bohman
samuel-bohman / jaccard.R
Last active October 4, 2017 21:19
Function for calculating the Jaccard similarity and distance coefficients
jaccard <- function(x, m) {
if (m == 1 | m == 2) {
M_00 <- apply(x, m, sum) == 0
M_11 <- apply(x, m, sum) == 2
if (m == 1) {
x <- x[!M_00, ]
JSim <- sum(M_11) / nrow(x)
} else {
x <- x[, !M_00]
JSim <- sum(M_11) / length(x)