Skip to content

Instantly share code, notes, and snippets.

@dmpetrov
Created March 6, 2017 05:12
Show Gist options
  • Save dmpetrov/953895e0b0a7997b086ab8b50d66bac2 to your computer and use it in GitHub Desktop.
Save dmpetrov/953895e0b0a7997b086ab8b50d66bac2 to your computer and use it in GitHub Desktop.
How Much Memory Does A Data Scientist Need (base)
# Code from blogpost:
# https://fullstackml.com/2015/12/06/how-much-memory-does-a-data-scientist-need/
library(ggplot2)
library(dplyr)
file <- "dataset-sizes.cv"
data <- read.csv(file, sep="\t")
data.slice <- data %>%
filter(year == 2006 | year == 2009 | year == 2012 | year == 2015)
data.slice.cum_freq <- data.slice %>%
group_by(year, sizeGB) %>%
summarise(value = sum(freq)) %>%
mutate(user_prop = value/sum(value), cum_freq = cumsum(value)/sum(value))
ggplot(data.slice.cum_freq, aes(x=log10(sizeGB), y=cum_freq, color=factor(year))) +
geom_line(aes(group = factor(year)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment