Skip to content

Instantly share code, notes, and snippets.

@ekalosak

ekalosak/foobar.csv

Created Feb 14, 2020
Embed
What would you like to do?
Visualize high-dimensional optimization runs
GOLD.chili_peppers GOLD.ham SEPIA.chili_peppers SEPIA.ham score step step.quart
otter 5.5555555555556 lion 1.11111111111112 693.805 1 1
gila monster 2.22222222222224 ocelot 3.33333333333336 343.59 2 1
gila monster 4.44444444444448 rhinoceros 1.11111111111112 409.635 3 1
gila monster 2.22222222222224 rhinoceros 4.44444444444448 390.759 4 2
otter 8.88888888888896 rhinoceros 5.5555555555556 830.934 5 2
gila monster 0 rhinoceros 5.5555555555556 361.278 6 2
gila monster 0 silver fox 1.11111111111112 717.802 7 3
gila monster 1.11111111111112 ocelot 7.77777777777784 933.906 8 3
gila monster 1.11111111111112 ocelot 1.11111111111112 408.426 9 3
gnu 6.66666666666672 rhinoceros 7.77777777777784 872.502 10 4
gnu 2.22222222222224 silver fox 6.66666666666672 941.119 11 4

Using R version 3.6.2 Start the R repl. Load your data. Your data should be in the format of the attached foobar.csv. The first few columns are the X values suggested by the optimizer. Score is the Y value. Step is the, well, step. Step.quart is the quartile of the step.

states$step.quart <- as.factor(ntile(states$step, 4))

Then use this plotting code:

lower_continuous_plot <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_hex(..., bins = 4) +
    scale_fill_viridis_c(option = "inferno")
}

lower_combo_plot <- function(data, mapping, ...) {
  x.str <- str_remove(as.character(mapping["x"]), "~")
  y.str <- str_remove(as.character(mapping["y"]), "~")
  col.str <- str_remove(as.character(mapping["color"]), "~")
  if (class(data[, x.str]) == "numeric") {
    pp <- ggplot(data, mapping) +
      geom_violin(aes_string(x = y.str, y = x.str, fill = col.str)) +
      scale_fill_viridis_d() +
      coord_flip()
  } else {
    pp <- ggplot(data, mapping) + geom_violin()
  }
  return(pp)
}

upper_combo_plot <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_point(..., aes(alpha = 0.4)) +
    guides(alpha = F) +
    scale_color_viridis_c(option = "viridis")
}

upper_continuous_plot <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_point(..., aes(alpha = 0.4)) +
    guides(alpha = F) +
    scale_color_viridis_c(option = "viridis")
}

diagnonal_continuous_plot <- function(data, mapping, ...) {
  x.str <- str_remove(as.character(mapping["x"]), "~")
  nbins <- length(unique(data[, x.str]))
  ggplot(data = data, mapping = mapping) +
    geom_histogram(..., aes(fill = step.quart), bins = nbins) +
    scale_fill_viridis_d()
}

p <- ggpairs(states,
  columns = which(colnames(states) %in% state.names),
  aes(color = step, size = score),
  lower = list(
    continuous = lower_continuous_plot,
    combo = lower_combo_plot
  ),
  upper = list(
    continuous = upper_continuous_plot,
    combo = upper_combo_plot
    # continuous = upper_continuous_plot,
    # discrete = lower_discrete_plot,
    # na = lower_discrete_plot
  ),
  diag = list(
    continuous = diagnonal_continuous_plot
  ),
  legend = c(1, 2)
) + theme_bw()
@ekalosak

This comment has been minimized.

Copy link
Owner Author

@ekalosak ekalosak commented Feb 14, 2020

This is the expected result.
Screen Shot 2020-02-13 at 4 37 56 PM
The diagonal is the marginal frequency of each X in the optimization run. The lower triangle is similar - a 2d marginal frequency. The upper triangle is the sequence of steps, as described by the legend on the rhs. Diagonal plots for continuous variables show, via color, the proportion of samples at that x-value suggested at a particular step.quantile.

@lawrennd

This comment has been minimized.

Copy link

@lawrennd lawrennd commented Feb 14, 2020

Really nice, thanks @ekalosak!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment