Skip to content

Instantly share code, notes, and snippets.

@jgm
Created October 28, 2012 04:04
Show Gist options
  • Save jgm/3967452 to your computer and use it in GitHub Desktop.
Save jgm/3967452 to your computer and use it in GitHub Desktop.
Benchmark of various reimplementations of Data.Char.isSpace
{-# OPTIONS_GHC -Wall -fwarn-tabs #-}
{-# LANGUAGE ForeignFunctionInterface #-}
----------------------------------------------------------------
-- Modified by John MacFarlane from an earlier benchmark by
-- wren ng thornton.
---------------------------------------------------------------
module Main (main) where
import qualified Data.Char as C
import Foreign.C.Types (CInt(..))
import Criterion (bench, bgroup, nf)
import Criterion.Main (defaultMain)
----------------------------------------------------------------
-- N.B. \x9..\xD == "\t\n\v\f\r"
foreign import ccall unsafe "u_iswspace"
iswspace :: CInt -> CInt
-- | Verbatim version of 'Data.Char.isSpace' (i.e., 'GHC.Unicode.isSpace'
-- as of base-4.2.0.2).
isSpace_DataChar :: Char -> Bool
{-# INLINE isSpace_DataChar #-}
isSpace_DataChar c =
c == ' ' ||
c == '\t' ||
c == '\n' ||
c == '\r' ||
c == '\f' ||
c == '\v' ||
c == '\xa0' ||
iswspace (fromIntegral (C.ord c)) /= 0
isSpace_Alt :: Char -> Bool
{-# INLINE isSpace_Alt #-}
isSpace_Alt c | c > '\x20' && c < '\xa0' = False
| c == ' ' = True
| '\t' <= c && c <= '\r' = True
| c == '\xa0' = True
| otherwise = iswspace (fromIntegral (C.ord c)) /= 0
isSpace_Alt' :: Char -> Bool
{-# INLINE isSpace_Alt' #-}
isSpace_Alt' c | c > '\x20' && c < '\xa0' = False
| c == ' ' = True
| c > '\xa0' && c <= '\xff' = False
| '\t' <= c && c <= '\r' = True
| c < '\t' = False
| c == '\xa0' = True
| otherwise = iswspace (fromIntegral (C.ord c)) /= 0
isSpace_Pattern :: Char -> Bool
{-# INLINE isSpace_Pattern #-}
isSpace_Pattern c | c == ' ' = True
| '\t' <= c && c <= '\r' = True
| c == '\xa0' = True
| otherwise = iswspace (fromIntegral (C.ord c)) /= 0
----------------------------------------------------------------
main :: IO ()
main = do
let text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit,\nsed do eiusmod tempor incididunt ut labore et\ndolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut\naliquip ex ea commodo consequat. Duis aute irure dolor\nin reprehenderit in voluptate velit esse cillum dolore\neu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,\nsunt in culpa qui officia deserunt mollit anim id est laborum.\n"
let chars = ['\0'..'\255']
let upperchars = ['\0'..'\2048']
defaultMain
[ bgroup "lorem ipsum"
[ bench "isSpace_DataChar" $ nf (map isSpace_DataChar) text
, bench "isSpace_Alt" $ nf (map isSpace_Alt) text
, bench "isSpace_Alt'" $ nf (map isSpace_Alt') text
, bench "isSpace_Pattern" $ nf (map isSpace_Pattern) text
]
, bgroup "chars 0..255"
[ bench "isSpace_DataChar" $ nf (map isSpace_DataChar) chars
, bench "isSpace_Alt" $ nf (map isSpace_Alt) chars
, bench "isSpace_Alt'" $ nf (map isSpace_Alt') chars
, bench "isSpace_Pattern" $ nf (map isSpace_Pattern) chars
]
, bgroup "chars 0..2048"
[ bench "isSpace_DataChar" $ nf (map isSpace_DataChar) upperchars
, bench "isSpace_Alt" $ nf (map isSpace_Alt) upperchars
, bench "isSpace_Alt'" $ nf (map isSpace_Alt') upperchars
, bench "isSpace_Pattern" $ nf (map isSpace_Pattern) upperchars
]
]
@jgm
Copy link
Author

jgm commented Oct 28, 2012

Benchmarks:

warming up
estimating clock resolution...
mean is 2.838068 us (320001 iterations)
found 73566 outliers among 319999 samples (23.0%)
  67597 (21.1%) low severe
  5969 (1.9%) high severe
estimating cost of a clock call...
mean is 80.03109 ns (23 iterations)
found 2 outliers among 23 samples (8.7%)
  1 (4.3%) high mild
  1 (4.3%) high severe

benchmarking lorem ipsum/isSpace_DataChar
mean: 167.6821 us, lb 167.4299 us, ub 168.2323 us, ci 0.950
std dev: 1.829837 us, lb 740.2509 ns, ub 3.182035 us, ci 0.950

benchmarking lorem ipsum/isSpace_Alt
mean: 34.36398 us, lb 34.30728 us, ub 34.47729 us, ci 0.950
std dev: 393.1567 ns, lb 224.1218 ns, ub 644.0638 ns, ci 0.950

benchmarking lorem ipsum/isSpace_Alt'
mean: 35.40429 us, lb 35.34127 us, ub 35.63812 us, ci 0.950
std dev: 559.4270 ns, lb 131.6690 ns, ub 1.300654 us, ci 0.950
found 1 outliers among 100 samples (1.0%)
  1 (1.0%) high severe
variance introduced by outliers: 8.493%
variance is slightly inflated by outliers

benchmarking lorem ipsum/isSpace_Pattern
mean: 92.03414 us, lb 91.87379 us, ub 92.23703 us, ci 0.950
std dev: 921.6294 ns, lb 751.2158 ns, ub 1.132215 us, ci 0.950

benchmarking chars 0..255/isSpace_DataChar
mean: 105.5900 us, lb 104.8196 us, ub 107.0775 us, ci 0.950
std dev: 5.295460 us, lb 3.004813 us, ub 8.202836 us, ci 0.950
found 13 outliers among 100 samples (13.0%)
  3 (3.0%) high mild
  10 (10.0%) high severe
variance introduced by outliers: 48.447%
variance is moderately inflated by outliers

benchmarking chars 0..255/isSpace_Alt
mean: 50.60586 us, lb 49.68629 us, ub 52.55842 us, ci 0.950
std dev: 6.576251 us, lb 3.662170 us, ub 11.25040 us, ci 0.950
found 8 outliers among 100 samples (8.0%)
  4 (4.0%) high mild
  4 (4.0%) high severe
variance introduced by outliers: 87.283%
variance is severely inflated by outliers

benchmarking chars 0..255/isSpace_Alt'
mean: 32.06238 us, lb 32.01606 us, ub 32.13059 us, ci 0.950
std dev: 285.3981 ns, lb 210.8640 ns, ub 372.0094 ns, ci 0.950

benchmarking chars 0..255/isSpace_Pattern
mean: 59.54869 us, lb 59.31934 us, ub 60.43369 us, ci 0.950
std dev: 2.052471 us, lb 441.9723 ns, ub 4.790058 us, ci 0.950
found 13 outliers among 100 samples (13.0%)
  7 (7.0%) high mild
  6 (6.0%) high severe
variance introduced by outliers: 30.653%
variance is moderately inflated by outliers

benchmarking chars 0..2048/isSpace_DataChar
mean: 839.7385 us, lb 837.4789 us, ub 842.6115 us, ci 0.950
std dev: 12.99311 us, lb 10.58307 us, ub 16.80527 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
  5 (5.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 8.471%
variance is slightly inflated by outliers

benchmarking chars 0..2048/isSpace_Alt
mean: 621.9947 us, lb 620.6333 us, ub 623.6288 us, ci 0.950
std dev: 7.646395 us, lb 6.532614 us, ub 9.394852 us, ci 0.950

benchmarking chars 0..2048/isSpace_Alt'
mean: 699.9485 us, lb 697.5734 us, ub 705.9943 us, ci 0.950
std dev: 18.13457 us, lb 8.958339 us, ub 37.15073 us, ci 0.950
found 15 outliers among 100 samples (15.0%)
  14 (14.0%) high severe
variance introduced by outliers: 19.973%
variance is moderately inflated by outliers

benchmarking chars 0..2048/isSpace_Pattern
mean: 476.5560 us, lb 474.8012 us, ub 478.7411 us, ci 0.950
std dev: 10.03762 us, lb 8.306136 us, ub 12.78500 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
  3 (3.0%) high mild
variance introduced by outliers: 14.208%
variance is moderately inflated by outliers

Compiled with -O2:

warming up
estimating clock resolution...
mean is 2.947041 us (320001 iterations)
found 80073 outliers among 319999 samples (25.0%)
  74975 (23.4%) low severe
  5098 (1.6%) high severe
estimating cost of a clock call...
mean is 81.17004 ns (24 iterations)
found 1 outliers among 24 samples (4.2%)
  1 (4.2%) high mild

benchmarking lorem ipsum/isSpace_DataChar
mean: 30.14521 us, lb 29.96999 us, ub 30.33899 us, ci 0.950
std dev: 946.3883 ns, lb 778.0518 ns, ub 1.208575 us, ci 0.950
found 7 outliers among 100 samples (7.0%)
  1 (1.0%) low severe
  5 (5.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 26.757%
variance is moderately inflated by outliers

benchmarking lorem ipsum/isSpace_Alt
mean: 12.60212 us, lb 12.53900 us, ub 12.71222 us, ci 0.950
std dev: 416.1512 ns, lb 279.8485 ns, ub 633.4982 ns, ci 0.950
found 16 outliers among 100 samples (16.0%)
  7 (7.0%) high mild
  9 (9.0%) high severe
variance introduced by outliers: 28.710%
variance is moderately inflated by outliers

benchmarking lorem ipsum/isSpace_Alt'
mean: 12.13401 us, lb 12.10168 us, ub 12.20219 us, ci 0.950
std dev: 230.2353 ns, lb 116.9494 ns, ub 384.2021 ns, ci 0.950
found 8 outliers among 100 samples (8.0%)
  7 (7.0%) high severe
variance introduced by outliers: 12.267%
variance is moderately inflated by outliers

benchmarking lorem ipsum/isSpace_Pattern
mean: 26.31246 us, lb 26.27771 us, ub 26.36846 us, ci 0.950
std dev: 222.7832 ns, lb 151.9722 ns, ub 318.6675 ns, ci 0.950

benchmarking chars 0..255/isSpace_DataChar
mean: 16.18380 us, lb 16.14711 us, ub 16.28942 us, ci 0.950
std dev: 291.8942 ns, lb 121.8686 ns, ub 623.7894 ns, ci 0.950
found 9 outliers among 100 samples (9.0%)
  2 (2.0%) high mild
  7 (7.0%) high severe
variance introduced by outliers: 10.419%
variance is moderately inflated by outliers

benchmarking chars 0..255/isSpace_Alt
mean: 11.23248 us, lb 11.21326 us, ub 11.27168 us, ci 0.950
std dev: 134.6124 ns, lb 77.14355 ns, ub 234.6864 ns, ci 0.950

benchmarking chars 0..255/isSpace_Alt'
mean: 7.975151 us, lb 7.908506 us, ub 8.071530 us, ci 0.950
std dev: 408.8461 ns, lb 308.5389 ns, ub 563.9267 ns, ci 0.950
found 12 outliers among 100 samples (12.0%)
  9 (9.0%) high mild
  3 (3.0%) high severe
variance introduced by outliers: 49.454%
variance is moderately inflated by outliers

benchmarking chars 0..255/isSpace_Pattern
mean: 15.77091 us, lb 15.74314 us, ub 15.82813 us, ci 0.950
std dev: 196.2705 ns, lb 97.60339 ns, ub 326.9984 ns, ci 0.950

benchmarking chars 0..2048/isSpace_DataChar
mean: 120.7414 us, lb 120.4466 us, ub 121.1279 us, ci 0.950
std dev: 1.715293 us, lb 1.405423 us, ub 2.289000 us, ci 0.950
found 7 outliers among 100 samples (7.0%)
  6 (6.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 7.505%
variance is slightly inflated by outliers

benchmarking chars 0..2048/isSpace_Alt
mean: 121.7447 us, lb 121.3160 us, ub 122.4079 us, ci 0.950
std dev: 2.698253 us, lb 1.972786 us, ub 3.891284 us, ci 0.950
found 16 outliers among 100 samples (16.0%)
  7 (7.0%) high mild
  9 (9.0%) high severe
variance introduced by outliers: 15.186%
variance is moderately inflated by outliers

benchmarking chars 0..2048/isSpace_Alt'
mean: 123.8306 us, lb 122.7197 us, ub 125.4704 us, ci 0.950
std dev: 6.858273 us, lb 5.224519 us, ub 9.517053 us, ci 0.950
found 16 outliers among 100 samples (16.0%)
  5 (5.0%) high mild
  11 (11.0%) high severe
variance introduced by outliers: 53.459%
variance is severely inflated by outliers

benchmarking chars 0..2048/isSpace_Pattern
mean: 123.5210 us, lb 122.0242 us, ub 127.5064 us, ci 0.950
std dev: 11.68411 us, lb 5.536108 us, ub 24.44218 us, ci 0.950
found 13 outliers among 100 samples (13.0%)
  6 (6.0%) high mild
  7 (7.0%) high severe
variance introduced by outliers: 76.938%
variance is severely inflated by outliers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment