Skip to content

Instantly share code, notes, and snippets.

@mitiko
Last active May 2, 2023 09:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mitiko/12db69e922409063a60a447f83372403 to your computer and use it in GitHub Desktop.
Save mitiko/12db69e922409063a60a447f83372403 to your computer and use it in GitHub Desktop.
book1 enwik8
size encode time decode time size encode time decode time
order0 388'064 70ms 100ms 55'068'843 10.3s 13.3s
order0 entropy hashing 354'005 470ms 490ms 53'672'845 60.9s 63.2s
order0 entropy hashing 8-bit cache 354'005 450ms 470ms 53'672'845 58.0s 59.5s
order0 entropy hashing 16-bit cache 354'005 250ms 290ms 53'672'845 35.5s 41.6s
order1 303'964 70ms 100ms 42'054'383 9.6s 13.6s
order1 entropy hashing 264'464 1.1s 1.1s 35'392'005 138s 144s
order1 entropy hashing 8-bit cache 264'464 1s 1.1s 35'392'005 131s 136s
order1 entropy hashing 16-bit cache 264'464 790ms 830ms 35'392'005 98s 103s

order0 = last 8 bits of history + 3 bits of alignment
order1 = last 16 bits of history + 3 bits of alignment
entropy hashing = 3 bits of alignment + n bits of compressed history (history is limited to 64 bits)
n-bit cache = speedup entropy hashing by caching the state of coder + writer after n bits.
caching is done runtime.

Model used by entropy hashing coder is book1-tuned, source: rev_bit_stationary_model.rs
Entropy hashing optimizes context length for the same amount of memory usage.
It achieves a smooth memory-speed trade-off for greater compression ratios.

Adjust the size of the cache to get speed but more memory usage.

@mitiko
Copy link
Author

mitiko commented May 2, 2023

Unaligned vs Aligned comparison (uses 3 extra bits of context)

book1 enwik8
unaligned aligned unaligned aligned
order0 508'920 388'064 72'197'572 55'068'843
order0 entropy hashing 547'139 354'005 76'184'175 53'672'845
order1 313'711 303'964 45'459'260 42'054'383
order1 entropy hashing 416'621 264'464 59'619'618 35'392'005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment