Skip to content

Instantly share code, notes, and snippets.

@tfausak
Last active January 11, 2018 17:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tfausak/e80def1ea2494902bd4a83dc928bf577 to your computer and use it in GitHub Desktop.
Save tfausak/e80def1ea2494902bd4a83dc928bf577 to your computer and use it in GitHub Desktop.
Benchmarking CRC32 in Haskell and Rust for Rocket League.

I am the maintainer of Rattletrap, a Rocket League replay parser written in Haskell. Recently I became aware of @nickbabcock's Boxcars project, which is also a Rocket League replay parser but it's written in Rust. I generally think of Haskell as Fast Enough ™️ for a high-level language. I was curious to see how much faster Rust would be for this particular application. You can see some casual benchmarks here: nickbabcock/boxcars#6.

This Gist benchmarks four basic operations in both Haskell and Rust. I am a Haskell programmer and want to see it triumph, but it turns out Rust is at least an order of magnitude faster in these benchmarks. However, in the benchmark that actuall does something (the CRC benchmark), Haskell and Rust are within one millisecond of each other. If you think there are any problems with the benchmark, please let me know. Also feel free to contribute faster Haskell (or Rust!) code. I've done my best.

These benchmarks were run against a 4,324,393 byte Rocket League replay file. It was the largest replay file I had on hand. You can see its details here: https://www.rocketleaguereplays.com/replays/a4416dea-40c8-17ef-51f3-4a8bb341942f/. You can download it directly here: https://media.rocketleaguereplays.com/uploads/replay_files/A4416DEA40C817EF51F34A8BB341942F.replay.

The benchmarks were run on a 2016 15-inch MacBook Pro. The Haskell code uses GHC 8.2.2. The Rust code uses rustc 1.25.0 nightly.

Function Description Haskell Rust Notes
1 nothing 0 ns 0 ns n/a
2 maximum 2,524,000 ns 149,247 ns Rust is 17x faster
3 fold 18,790,000 ns 105,423 ns Rust is 178x faster
4 crc 12,630,000 ns 11,518,491 ns Rust is 1.1x faster
  1. This benchmark does absolutely nothing.

    -- 14 ns
    f1 :: ByteString -> Word8
    f1 = const 0xff
    // 0 ns
    fn f1(_: &[u8]) -> u8 {
        0xff
    }
  2. This benchmark finds the largest byte in a string of bytes. It does so as fast as possible. In Haskell, this is a thin wrapper around some C code.

    -- 3,662,000 ns
    f2 :: ByteString -> Word8
    f2 = maximum
    // 115,061 ns
    fn f2(bytes: &[u8]) -> u8 {
        let mut max = 0x00;
        for byte in bytes { if *byte > max { max = *byte } }
        max
    }
  3. This benchmark has the same goal as the previous one, but it uses a fold (aka reduce in some languages) instead of the fastest possible thing. As promised, folds are a zero-cost abstraction in Rust.

    -- 45,570,000 ns
    f3 :: ByteString -> Word8
    f3 = foldl' max 0x00
    // 120,626 ns
    fn f3(bytes: &[u8]) -> u8 {
        bytes.iter().fold(0x00, |a, e| if *e > a { *e } else { a })
    }
  4. This benchmark computes the CRC32 of the replay using Rocket League's bespoke CRC32 algorithm. It uses the "standard implementation" listed on Fast CRC32, which uses a pre-computed table.

    -- 454,900,000 ns
    f4 :: ByteString -> Word32
    f4 = foldl' update 0x00
    update crc byte = xor
      (unsafeShiftL crc 8)
      (unsafeIndex
        table
        (fromIntegral (xor byte (fromIntegral (unsafeShiftR crc 24)))))
    // 10,529,933 ns
    fn f4(bytes: &[u8]) -> u32 {
        let mut crc = 0x10340dfe;
        for byte in bytes {
            crc = (crc << 8) ^ TABLE[(*byte ^ ((crc >> 24) as u8)) as usize]
        }
        !crc
    }
-- stack --resolver nightly-2018-01-09 script --compile
{-# OPTIONS_GHC -O2 #-}
import qualified Data.Bits as Bits
import qualified Data.ByteString as Bytes
import qualified Data.Vector.Unboxed as Vector
import qualified Data.Word as Word
import qualified Gauge
import qualified Prelude
main :: Prelude.IO ()
main = do
bytes <- Bytes.readFile "A4416DEA40C817EF51F34A8BB341942F.replay"
Gauge.defaultMain
[ Gauge.bench "const 0xff" (Gauge.whnf f1 bytes) -- 288.5 ps
, Gauge.bench "maximum" (Gauge.whnf f2 bytes) -- 2.524 ms
, Gauge.bench "foldl' max 0x00" (Gauge.whnf f3 bytes) -- 18.79 ms
, Gauge.bench "crc" (Gauge.whnf f4 bytes) -- 12.63 ms
]
f1 :: Bytes.ByteString -> Word.Word8
f1 = Prelude.const 0xff
f2 :: Bytes.ByteString -> Word.Word8
f2 = Bytes.maximum
f3 :: Bytes.ByteString -> Word.Word8
f3 = Bytes.foldl' Prelude.max 0x00
f4 :: Bytes.ByteString -> Word.Word32
f4 = crc32
crc32 :: Bytes.ByteString -> Word.Word32
crc32 bytes = Bits.complement (Bytes.foldl' crc32Update (Bits.complement 0xefcbf201) bytes)
crc32Update :: Word.Word32 -> Word.Word8 -> Word.Word32
crc32Update crc byte = Bits.xor
(Bits.unsafeShiftL crc 8)
(Vector.unsafeIndex
crc32Table
(word8ToInt (Bits.xor byte (word32ToWord8 (Bits.unsafeShiftR crc 24)))))
word32ToWord8 :: Word.Word32 -> Word.Word8
word32ToWord8 = Prelude.fromIntegral
word8ToInt :: Word.Word8 -> Prelude.Int
word8ToInt = Prelude.fromIntegral
crc32Table :: Vector.Vector Word.Word32
crc32Table = Vector.fromList
[ 0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9
, 0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005
, 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61
, 0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd
, 0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9
, 0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75
, 0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011
, 0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd
, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039
, 0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5
, 0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81
, 0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d
, 0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49
, 0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95
, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1
, 0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d
, 0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae
, 0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072
, 0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16
, 0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca
, 0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde
, 0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02
, 0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066
, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba
, 0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e
, 0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692
, 0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6
, 0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a
, 0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e
, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2
, 0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686
, 0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a
, 0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637
, 0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb
, 0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f
, 0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53
, 0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47
, 0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b
, 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff
, 0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623
, 0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7
, 0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b
, 0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f
, 0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3
, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7
, 0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b
, 0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f
, 0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3
, 0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640
, 0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c
, 0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8
, 0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24
, 0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30
, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec
, 0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088
, 0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654
, 0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0
, 0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c
, 0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18
, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4
, 0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0
, 0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c
, 0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668
, 0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
]
#![feature(test)]
extern crate test;
use test::Bencher;
const BYTES: [u8; 4324393] = *include_bytes!("A4416DEA40C817EF51F34A8BB341942F.replay");
#[bench] fn bench_1(b: &mut Bencher) { b.iter(|| assert_eq!(f1(&BYTES), 0xff)) } // 0 ns/iter
#[bench] fn bench_2(b: &mut Bencher) { b.iter(|| assert_eq!(f2(&BYTES), 0xff)) } // 149,247 ns/iter
#[bench] fn bench_3(b: &mut Bencher) { b.iter(|| assert_eq!(f3(&BYTES), 0xff)) } // 105,423 ns/iter
#[bench] fn bench_4(b: &mut Bencher) { b.iter(|| assert_eq!(f4(&BYTES), 0x0ceed553)) } // 11,518,491 ns/iter
fn f1(_: &[u8]) -> u8 { 0xff }
fn f2(bytes: &[u8]) -> u8 {
let mut max = 0x00;
for byte in bytes { if *byte > max { max = *byte } }
max
}
fn f3(bytes: &[u8]) -> u8 {
bytes.iter().fold(0x00, |a, e| if *e > a { *e } else { a })
}
fn f4(bytes: &[u8]) -> u32 {
let mut crc = !0xefcbf201;
for byte in bytes {
crc = (crc << 8) ^ CRC_32_TABLE[(*byte ^ ((crc >> 24) as u8)) as usize]
}
!crc
}
const CRC_32_TABLE: [u32; 256] =
[ 0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9
, 0x130476dc, 0x17c56b6b, 0x1a864db2, 0x1e475005
, 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6, 0x2b4bcb61
, 0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd
, 0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9
, 0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75
, 0x6a1936c8, 0x6ed82b7f, 0x639b0da6, 0x675a1011
, 0x791d4014, 0x7ddc5da3, 0x709f7b7a, 0x745e66cd
, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039
, 0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5
, 0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81
, 0xad2f2d84, 0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d
, 0xd4326d90, 0xd0f37027, 0xddb056fe, 0xd9714b49
, 0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95
, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1
, 0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d
, 0x34867077, 0x30476dc0, 0x3d044b19, 0x39c556ae
, 0x278206ab, 0x23431b1c, 0x2e003dc5, 0x2ac12072
, 0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16
, 0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca
, 0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde
, 0x6b93dddb, 0x6f52c06c, 0x6211e6b5, 0x66d0fb02
, 0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1, 0x53dc6066
, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba
, 0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e
, 0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692
, 0x8aad2b2f, 0x8e6c3698, 0x832f1041, 0x87ee0df6
, 0x99a95df3, 0x9d684044, 0x902b669d, 0x94ea7b2a
, 0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e
, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2
, 0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686
, 0xd5b88683, 0xd1799b34, 0xdc3abded, 0xd8fba05a
, 0x690ce0ee, 0x6dcdfd59, 0x608edb80, 0x644fc637
, 0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb
, 0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f
, 0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53
, 0x251d3b9e, 0x21dc2629, 0x2c9f00f0, 0x285e1d47
, 0x36194d42, 0x32d850f5, 0x3f9b762c, 0x3b5a6b9b
, 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff
, 0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623
, 0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7
, 0xe22b20d2, 0xe6ea3d65, 0xeba91bbc, 0xef68060b
, 0xd727bbb6, 0xd3e6a601, 0xdea580d8, 0xda649d6f
, 0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3
, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7
, 0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b
, 0x9b3660c6, 0x9ff77d71, 0x92b45ba8, 0x9675461f
, 0x8832161a, 0x8cf30bad, 0x81b02d74, 0x857130c3
, 0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640
, 0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c
, 0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8
, 0x68860bfd, 0x6c47164a, 0x61043093, 0x65c52d24
, 0x119b4be9, 0x155a565e, 0x18197087, 0x1cd86d30
, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec
, 0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088
, 0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654
, 0xc5a92679, 0xc1683bce, 0xcc2b1d17, 0xc8ea00a0
, 0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb, 0xdbee767c
, 0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18
, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4
, 0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0
, 0x9abc8bd5, 0x9e7d9662, 0x933eb0bb, 0x97ffad0c
, 0xafb010b1, 0xab710d06, 0xa6322bdf, 0xa2f33668
, 0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
];
@bitemyapp
Copy link

After disabling threading, -O2 still on:

$ stack exec rattletrap-benchmark-hs
benchmarked const 0xff
time                 294.8 ps   (292.1 ps .. 297.6 ps)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 300.7 ps   (297.9 ps .. 305.1 ps)
std dev              11.16 ps   (6.575 ps .. 15.21 ps)
variance introduced by outliers: 19% (moderately inflated)

benchmarked maximum
time                 2.552 ms   (2.547 ms .. 2.557 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.557 ms   (2.555 ms .. 2.560 ms)
std dev              8.073 μs   (6.665 μs .. 10.31 μs)

benchmarked foldl' max 0x00
time                 20.34 ms   (20.23 ms .. 20.47 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 20.24 ms   (20.20 ms .. 20.29 ms)
std dev              103.8 μs   (79.64 μs .. 152.9 μs)

benchmarked crc
time                 12.83 ms   (12.79 ms .. 12.87 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 12.83 ms   (12.81 ms .. 12.85 ms)
std dev              49.85 μs   (35.12 μs .. 76.04 μs)

@bitemyapp
Copy link

Rust, non-descript toml:

running 4 tests
test bench_1 ... bench:           0 ns/iter (+/- 0)
test bench_2 ... bench:     116,602 ns/iter (+/- 10,577)
test bench_3 ... bench:     112,458 ns/iter (+/- 1,763)
test bench_4 ... bench:  11,542,696 ns/iter (+/- 134,570)

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment