Skip to content

Instantly share code, notes, and snippets.

@jhofman
Created February 14, 2019 19:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jhofman/c1fed3832cced745101a7f016dc157e1 to your computer and use it in GitHub Desktop.
Save jhofman/c1fed3832cced745101a7f016dc157e1 to your computer and use it in GitHub Desktop.
minimal example to show that md5sums of two files containing the same figure are different when saved as pdfs but not pngs due to the "created at" timestamp embedded in the pdf metadata
#
# file: compare_figure_md5sums.R
#
# description: minimal example to show that md5sums of two files
# containing the same figure are different when saved as pdfs but not
# pngs due to the "created at" timestamp embedded in the pdf metadata
#
# usage: Rscript compare_figure_md5sums.R
#
# requirements: tidyverse
#
# author: jake hofman (gmail: jhofman)
#
library(tidyverse)
########################################
# GENERATE A FIGURE, SAVE AS PDF AND PNG
########################################
# this seems not to matter, but just to be safe
set.seed(42)
# generate first version of a plot
p1 <- data.frame(x = 1:5, y = 1:5) %>%
ggplot(aes(x, y)) +
geom_point()
# save a pdf and a png version
pdf1 <- paste(tempfile(), '.pdf', sep = '')
ggsave(p1, file = pdf1, width = 4, height = 4)
png1 <- paste(tempfile(), '.png', sep = '')
ggsave(p1, file = png1, width = 4, height = 4)
# add a delay so "created at" times differ
Sys.sleep(2)
########################################
# DO THE SAME EXACT THING AGAIN
########################################
# this seems not to matter, but just to be safe
set.seed(42)
# generate the same plot
p2 <- data.frame(x = 1:5, y = 1:5) %>%
ggplot(aes(x, y)) +
geom_point()
# save a pdf and a png version
pdf2 <- paste(tempfile(), '.pdf', sep = '')
ggsave(p2, file = pdf2, width = 4, height = 4)
png2 <- paste(tempfile(), '.png', sep = '')
ggsave(p2, file = png2, width = 4, height = 4)
########################################
# COMPARE MD5 CHECKSUMS
########################################
# pdfs differ
print("### comparing pdfs")
system(sprintf('md5 %s %s', pdf1, pdf2))
# example output:
# MD5 (/var/folders/67/k9k0tm253s18k4yty1sl3k9h0000gn/T//RtmpkQzQyF/file88816dc03691.pdf) = 5944905b7051d32614cbe95e41644567
# MD5 (/var/folders/67/k9k0tm253s18k4yty1sl3k9h0000gn/T//RtmpkQzQyF/file8881612c52d8.pdf) = c314a7fb262324fba4bc8085d582882c
# pngs match
print("### comparing pngs")
system(sprintf('md5 %s %s', png1, png2))
# example output:
# MD5 (/var/folders/67/k9k0tm253s18k4yty1sl3k9h0000gn/T//RtmpkQzQyF/file888119f5ef38.png) = 9935417db872915ee86e6b3533a1c9bc
# MD5 (/var/folders/67/k9k0tm253s18k4yty1sl3k9h0000gn/T//RtmpkQzQyF/file888152d71a8c.png) = 9935417db872915ee86e6b3533a1c9bc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment