Skip to content

Instantly share code, notes, and snippets.

@alexnask
Last active July 5, 2020 05:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alexnask/c537360ae0163863564fba6e660f442b to your computer and use it in GitHub Desktop.
Save alexnask/c537360ae0163863564fba6e660f442b to your computer and use it in GitHub Desktop.
ctregex vs pcre

PCRE2 library and benchmark compiled with JIT support with -O3
Pattern compiled with PCRE2_UTF
ctregex example compiled with --release-fast and use .encoding = .utf8
Example file of 525MB, benchmark cosists of iterating through every line and matching the pattern in the line without anchors (aka search) ten times for every pattern and setting, after warming up.

Loading data...
Took 0.74sec
Loaded 7635717 lines, 543369275Bi
===========================================
Testing pattern "\[ [a-z A-Z 0-9 \- _ {} \\ \ ]* \]" on 7635717 lines...
Pattern found 1428510 times
Took 973ms
558.22MBi/sec
Pattern found 1428510 times
Took 955ms
568.73MBi/sec
Pattern found 1428510 times
Took 951ms
571.12MBi/sec
Pattern found 1428510 times
Took 945ms
574.74MBi/sec
Pattern found 1428510 times
Took 942ms
576.57MBi/sec
Pattern found 1428510 times
Took 974ms
557.65MBi/sec
Pattern found 1428510 times
Took 948ms
572.92MBi/sec
Pattern found 1428510 times
Took 941ms
577.18MBi/sec
Pattern found 1428510 times
Took 980ms
554.23MBi/sec
Pattern found 1428510 times
Took 966ms
562.26MBi/sec
===========================================
===========================================
Testing pattern "διακοπεί" on 7635717 lines...
Pattern found 10 times
Took 522ms
1039.58MBi/sec
Pattern found 10 times
Took 513ms
1057.78MBi/sec
Pattern found 10 times
Took 509ms
1066.08MBi/sec
Pattern found 10 times
Took 509ms
1066.08MBi/sec
Pattern found 10 times
Took 512ms
1059.85MBi/sec
Pattern found 10 times
Took 521ms
1041.58MBi/sec
Pattern found 10 times
Took 511ms
1061.78MBi/sec
Pattern found 10 times
Took 529ms
1025.31MBi/sec
Pattern found 10 times
Took 516ms
1051.03MBi/sec
Pattern found 10 times
Took 515ms
1054.94MBi/sec
===========================================
===========================================
Testing pattern "[0-9]{2},[0-9]{2}" on 7635717 lines...
Pattern found 2804861 times
Took 1466ms
370.65MBi/sec
Pattern found 2804861 times
Took 1526ms
355.89MBi/sec
Pattern found 2804861 times
Took 1535ms
353.92MBi/sec
Pattern found 2804861 times
Took 1479ms
367.19MBi/sec
Pattern found 2804861 times
Took 1450ms
374.62MBi/sec
Pattern found 2804861 times
Took 1450ms
374.65MBi/sec
Pattern found 2804861 times
Took 1473ms
368.64MBi/sec
Pattern found 2804861 times
Took 1459ms
372.26MBi/sec
Pattern found 2804861 times
Took 1455ms
373.41MBi/sec
Pattern found 2804861 times
Took 1487ms
365.19MBi/sec
===========================================
===========================================
Testing pattern "([α-ω]+)" on 7635717 lines...
Pattern found 194 times
Took 1698ms
319.97MBi/sec
Pattern found 194 times
Took 1679ms
323.53MBi/sec
Pattern found 194 times
Took 1622ms
334.82MBi/sec
Pattern found 194 times
Took 1642ms
330.81MBi/sec
Pattern found 194 times
Took 1640ms
331.16MBi/sec
Pattern found 194 times
Took 1710ms
317.61MBi/sec
Pattern found 194 times
Took 1686ms
322.10MBi/sec
Pattern found 194 times
Took 1683ms
322.72MBi/sec
Pattern found 194 times
Took 1650ms
329.26MBi/sec
Pattern found 194 times
Took 1684ms
322.55MBi/sec
===========================================
Loading data...
Took 6.331s
Loaded 7635717 lines, 543369275Bi
===========================================
Testing pattern "\[[a-zA-Z0-9\-_{}\\\ ]*\]" on 7635717 lines...
---------- NO JIT ----------
Pattern found 1428510 times
Took 1270ms
427.85MBi/sec
Pattern found 1428510 times
Took 1278ms
425.172MBi/sec
Pattern found 1428510 times
Took 1242ms
437.495MBi/sec
Pattern found 1428510 times
Took 1238ms
438.909MBi/sec
Pattern found 1428510 times
Took 1263ms
430.221MBi/sec
Pattern found 1428510 times
Took 1264ms
429.881MBi/sec
Pattern found 1428510 times
Took 1437ms
378.128MBi/sec
Pattern found 1428510 times
Took 1334ms
407.323MBi/sec
Pattern found 1428510 times
Took 1278ms
425.172MBi/sec
Pattern found 1428510 times
Took 1294ms
419.914MBi/sec
---------- JIT ----------
Pattern found 1428510 times
Took 1266ms
429.202MBi/sec
Pattern found 1428510 times
Took 1271ms
427.513MBi/sec
Pattern found 1428510 times
Took 1261ms
430.903MBi/sec
Pattern found 1428510 times
Took 1276ms
425.838MBi/sec
Pattern found 1428510 times
Took 1260ms
431.245MBi/sec
Pattern found 1428510 times
Took 1243ms
437.143MBi/sec
Pattern found 1428510 times
Took 1242ms
437.495MBi/sec
Pattern found 1428510 times
Took 1271ms
427.513MBi/sec
Pattern found 1428510 times
Took 1239ms
438.555MBi/sec
Pattern found 1428510 times
Took 1237ms
439.264MBi/sec
---------- DFA ----------
Pattern found 1428510 times
Took 2959ms
183.633MBi/sec
Pattern found 1428510 times
Took 3002ms
181.002MBi/sec
Pattern found 1428510 times
Took 2985ms
182.033MBi/sec
Pattern found 1428510 times
Took 3022ms
179.805MBi/sec
Pattern found 1428510 times
Took 2979ms
182.4MBi/sec
Pattern found 1428510 times
Took 3021ms
179.864MBi/sec
Pattern found 1428510 times
Took 2946ms
184.443MBi/sec
Pattern found 1428510 times
Took 2949ms
184.255MBi/sec
Pattern found 1428510 times
Took 2971ms
182.891MBi/sec
Pattern found 1428510 times
Took 3009ms
180.581MBi/sec
===========================================
===========================================
Testing pattern "διακοπεί" on 7635717 lines...
---------- NO JIT ----------
Pattern found 10 times
Took 955ms
568.973MBi/sec
Pattern found 10 times
Took 979ms
555.025MBi/sec
Pattern found 10 times
Took 969ms
560.753MBi/sec
Pattern found 10 times
Took 975ms
557.302MBi/sec
Pattern found 10 times
Took 958ms
567.191MBi/sec
Pattern found 10 times
Took 942ms
576.825MBi/sec
Pattern found 10 times
Took 1025ms
530.116MBi/sec
Pattern found 10 times
Took 981ms
553.893MBi/sec
Pattern found 10 times
Took 989ms
549.413MBi/sec
Pattern found 10 times
Took 994ms
546.649MBi/sec
---------- JIT ----------
Pattern found 10 times
Took 983ms
552.766MBi/sec
Pattern found 10 times
Took 985ms
551.644MBi/sec
Pattern found 10 times
Took 1010ms
537.989MBi/sec
Pattern found 10 times
Took 990ms
548.858MBi/sec
Pattern found 10 times
Took 976ms
556.731MBi/sec
Pattern found 10 times
Took 977ms
556.161MBi/sec
Pattern found 10 times
Took 951ms
571.366MBi/sec
Pattern found 10 times
Took 964ms
563.661MBi/sec
Pattern found 10 times
Took 962ms
564.833MBi/sec
Pattern found 10 times
Took 952ms
570.766MBi/sec
---------- DFA ----------
Pattern found 10 times
Took 881ms
616.764MBi/sec
Pattern found 10 times
Took 874ms
621.704MBi/sec
Pattern found 10 times
Took 872ms
623.13MBi/sec
Pattern found 10 times
Took 881ms
616.764MBi/sec
Pattern found 10 times
Took 871ms
623.845MBi/sec
Pattern found 10 times
Took 876ms
620.285MBi/sec
Pattern found 10 times
Took 872ms
623.13MBi/sec
Pattern found 10 times
Took 885ms
613.977MBi/sec
Pattern found 10 times
Took 875ms
620.993MBi/sec
Pattern found 10 times
Took 860ms
631.825MBi/sec
===========================================
===========================================
Testing pattern "[0-9]{2},[0-9]{2}" on 7635717 lines...
---------- NO JIT ----------
Pattern found 2804861 times
Took 1813ms
299.707MBi/sec
Pattern found 2804861 times
Took 1822ms
298.227MBi/sec
Pattern found 2804861 times
Took 1785ms
304.409MBi/sec
Pattern found 2804861 times
Took 1808ms
300.536MBi/sec
Pattern found 2804861 times
Took 1797ms
302.376MBi/sec
Pattern found 2804861 times
Took 1787ms
304.068MBi/sec
Pattern found 2804861 times
Took 1790ms
303.558MBi/sec
Pattern found 2804861 times
Took 1829ms
297.085MBi/sec
Pattern found 2804861 times
Took 1798ms
302.208MBi/sec
Pattern found 2804861 times
Took 1796ms
302.544MBi/sec
---------- JIT ----------
Pattern found 2804861 times
Took 1806ms
300.869MBi/sec
Pattern found 2804861 times
Took 1776ms
305.951MBi/sec
Pattern found 2804861 times
Took 1779ms
305.435MBi/sec
Pattern found 2804861 times
Took 1821ms
298.391MBi/sec
Pattern found 2804861 times
Took 1862ms
291.82MBi/sec
Pattern found 2804861 times
Took 1830ms
296.923MBi/sec
Pattern found 2804861 times
Took 1888ms
287.802MBi/sec
Pattern found 2804861 times
Took 1844ms
294.669MBi/sec
Pattern found 2804861 times
Took 1865ms
291.351MBi/sec
Pattern found 2804861 times
Took 1802ms
301.537MBi/sec
---------- DFA ----------
Pattern found 2804861 times
Took 2091ms
259.861MBi/sec
Pattern found 2804861 times
Took 2214ms
245.424MBi/sec
Pattern found 2804861 times
Took 2091ms
259.861MBi/sec
Pattern found 2804861 times
Took 2084ms
260.734MBi/sec
Pattern found 2804861 times
Took 2125ms
255.703MBi/sec
Pattern found 2804861 times
Took 2173ms
250.055MBi/sec
Pattern found 2804861 times
Took 2107ms
257.888MBi/sec
Pattern found 2804861 times
Took 2064ms
263.26MBi/sec
Pattern found 2804861 times
Took 2217ms
245.092MBi/sec
Pattern found 2804861 times
Took 2148ms
252.965MBi/sec
===========================================
===========================================
Testing pattern "([α-ω]+)" on 7635717 lines...
---------- NO JIT ----------
Pattern found 194 times
Took 1460ms
372.171MBi/sec
Pattern found 194 times
Took 1445ms
376.034MBi/sec
Pattern found 194 times
Took 1484ms
366.152MBi/sec
Pattern found 194 times
Took 1581ms
343.687MBi/sec
Pattern found 194 times
Took 1562ms
347.868MBi/sec
Pattern found 194 times
Took 1551ms
350.335MBi/sec
Pattern found 194 times
Took 1494ms
363.701MBi/sec
Pattern found 194 times
Took 1532ms
354.68MBi/sec
Pattern found 194 times
Took 1467ms
370.395MBi/sec
Pattern found 194 times
Took 1464ms
371.154MBi/sec
---------- JIT ----------
Pattern found 194 times
Took 1565ms
347.201MBi/sec
Pattern found 194 times
Took 1440ms
377.34MBi/sec
Pattern found 194 times
Took 1461ms
371.916MBi/sec
Pattern found 194 times
Took 1477ms
367.887MBi/sec
Pattern found 194 times
Took 1428ms
380.511MBi/sec
Pattern found 194 times
Took 1437ms
378.128MBi/sec
Pattern found 194 times
Took 1433ms
379.183MBi/sec
Pattern found 194 times
Took 1456ms
373.193MBi/sec
Pattern found 194 times
Took 1457ms
372.937MBi/sec
Pattern found 194 times
Took 1442ms
376.816MBi/sec
---------- DFA ----------
Pattern found 194 times
Took 1388ms
391.476MBi/sec
Pattern found 194 times
Took 1375ms
395.178MBi/sec
Pattern found 194 times
Took 1345ms
403.992MBi/sec
Pattern found 194 times
Took 1344ms
404.293MBi/sec
Pattern found 194 times
Took 1347ms
403.392MBi/sec
Pattern found 194 times
Took 1340ms
405.499MBi/sec
Pattern found 194 times
Took 1347ms
403.392MBi/sec
Pattern found 194 times
Took 1343ms
404.594MBi/sec
Pattern found 194 times
Took 1367ms
397.49MBi/sec
Pattern found 194 times
Took 1343ms
404.594MBi/sec
===========================================
@data-man
Copy link

Is the benchmarking code available?
Would be nice compare with C++ compile-time-regular-expressions.

@alexnask
Copy link
Author

@data-man
The code is pretty bad, I will clean it up and release it eventually.
I plan to benchmark vs the lib you posted, Dlangs ctRegex as well as otehr good runtime engin es like NodeJS's, hyperscan etc.
Probably after I write the DFA matching though :P

@data-man
Copy link

Wow, reactions in gist's comments isn't works, I don't know it! :)

I plan to benchmark vs the lib you posted, Dlangs ctRegex as well as otehr good runtime engin es like NodeJS's, hyperscan etc.
Probably after I write the DFA matching though :P

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment