I am the maintainer of Rattletrap, a Rocket League replay parser written in Haskell. Recently I became aware of @nickbabcock's Boxcars project, which is also a Rocket League replay parser but it's written in Rust. I generally think of Haskell as Fast Enough ™️ for a high-level language. I was curious to see how much faster Rust would be for this particular application. You can see some casual benchmarks here: nickbabcock/boxcars#6.
This Gist benchmarks four basic operations in both Haskell and Rust. I am a Haskell programmer and want to see it triumph, but it turns out Rust is at least an order of magnitude faster in these benchmarks. However, in the benchmark that actuall does something (the CRC benchmark), Haskell and Rust are within one millisecond of each other. If you think there are any problems with the benchmark, please let me know. Also feel free to contribute faster Haskell (or Rust!) code. I've done my best.
These benchmarks were run against a 4,324,393 byte Rocket League replay file. It was the largest replay file I had on hand. You can see its details here: https://www.rocketleaguereplays.com/replays/a4416dea-40c8-17ef-51f3-4a8bb341942f/. You can download it directly here: https://media.rocketleaguereplays.com/uploads/replay_files/A4416DEA40C817EF51F34A8BB341942F.replay.
The benchmarks were run on a 2016 15-inch MacBook Pro.
The Haskell code uses GHC 8.2.2.
The Rust code uses rustc
1.25.0 nightly.
Function | Description | Haskell | Rust | Notes |
---|---|---|---|---|
1 | nothing | 0 ns | 0 ns | n/a |
2 | maximum | 2,524,000 ns | 149,247 ns | Rust is 17x faster |
3 | fold | 18,790,000 ns | 105,423 ns | Rust is 178x faster |
4 | crc | 12,630,000 ns | 11,518,491 ns | Rust is 1.1x faster |
-
This benchmark does absolutely nothing.
-- 14 ns f1 :: ByteString -> Word8 f1 = const 0xff
// 0 ns fn f1(_: &[u8]) -> u8 { 0xff }
-
This benchmark finds the largest byte in a string of bytes. It does so as fast as possible. In Haskell, this is a thin wrapper around some C code.
-- 3,662,000 ns f2 :: ByteString -> Word8 f2 = maximum
// 115,061 ns fn f2(bytes: &[u8]) -> u8 { let mut max = 0x00; for byte in bytes { if *byte > max { max = *byte } } max }
-
This benchmark has the same goal as the previous one, but it uses a fold (aka
reduce
in some languages) instead of the fastest possible thing. As promised, folds are a zero-cost abstraction in Rust.-- 45,570,000 ns f3 :: ByteString -> Word8 f3 = foldl' max 0x00
// 120,626 ns fn f3(bytes: &[u8]) -> u8 { bytes.iter().fold(0x00, |a, e| if *e > a { *e } else { a }) }
-
This benchmark computes the CRC32 of the replay using Rocket League's bespoke CRC32 algorithm. It uses the "standard implementation" listed on Fast CRC32, which uses a pre-computed table.
-- 454,900,000 ns f4 :: ByteString -> Word32 f4 = foldl' update 0x00 update crc byte = xor (unsafeShiftL crc 8) (unsafeIndex table (fromIntegral (xor byte (fromIntegral (unsafeShiftR crc 24)))))
// 10,529,933 ns fn f4(bytes: &[u8]) -> u32 { let mut crc = 0x10340dfe; for byte in bytes { crc = (crc << 8) ^ TABLE[(*byte ^ ((crc >> 24) as u8)) as usize] } !crc }
Rust, non-descript toml: