Skip to content

Instantly share code, notes, and snippets.

@anjackson
Created October 3, 2014 09:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anjackson/cc831f0d2245799f7a45 to your computer and use it in GitHub Desktop.
Save anjackson/cc831f0d2245799f7a45 to your computer and use it in GitHub Desktop.
Comparing Perl file reading methods for hash calcuation
opf:perl andy$ time perl sha256-asfile.pl ~/Downloads/ubuntu-12.10-desktop-amd64.iso
256a2cc652ec86ff366907fd7b878e577b631cc6c6533368c615913296069d80 /Users/andy/Downloads/ubuntu-12.10-desktop-amd64.iso
real 0m8.825s
user 0m8.102s
sys 0m0.479s
opf:perl andy$ time perl sha256-slurp.pl ~/Downloads/ubuntu-12.10-desktop-amd64.iso
256a2cc652ec86ff366907fd7b878e577b631cc6c6533368c615913296069d80 /Users/andy/Downloads/ubuntu-12.10-desktop-amd64.iso
real 0m20.203s
user 0m13.245s
sys 0m3.424s
@anjackson
Copy link
Author

The scripts are here: https://github.com/anjackson/keeping-codes/tree/gh-pages/experiments/checksum-benchmarking/perl

It seems that the read_file method (a.k.a. slurp) performs badly for large files. If it reads the whole file into memory, then that additional memory management may be responsible. If it only looks like a binary array, but is implemented using data streams, the fault may perhaps lie with the way the content is buffered or with other aspects of sysread.

I tried to understand what the slurp code is doing, and it does seem to be loading the whole file into memory. If this relies on the Perl engine transparently growing arrays as required, then there's probably a lot of malloc and memcpy going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment