public
Last active

Benchmark for Vim regexp engine performance

  • Download Gist
README
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Regular expressions and data from
http://lh3lh3.users.sourceforge.net/reb.shtml
 
Regular expressions benchmarked:
URI ([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?
Email ([^ @]+)@([^ @]+)
Date ([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)
URI|Email ([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)
Word .*SCSI-
 
Results (in seconds):
URI Email Date Sum3 URI|Email Word
re=1 16.34 13.65 4.07 34.06 29.46 0.49
re=2 92.03 9.75 4.47 106.25 105.39 5.22
Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03
(Five runs each, Vim 7.3.1010, 64-bit i7-2700K CPU @ 3.50GHz x 8.)
 
The Vim results were obtained with the bench.sh script.
 
Python, Perl, and egrep were timed in similar fashion using these invocations:
 
perl script.pl 'pattern' </path/to/data/howto >/dev/null
python script.py 'pattern' </path/to/data/howto >/dev/null
egrep 'pattern' /path/to/data/howto >/dev/null
 
The data file "howto" (~38M) is available at
http://people.unipmn.it/manzini/lightweight/corpus/howto.bz2
bench.sh
Shell
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#!/bin/bash
# Usage: ./bench.sh <engine> <script>
# where engine is (1|2)
# script is (uri|email|date|uriemail|word)
 
VIM="/path/to/vim/src/vim"
DATA="/path/to/data/howto"
 
vimrc="vimrc-${1:-1}"
rescript="re-${2:-word}.vim"
cmd=( "${VIM}" -N -u "${vimrc}" -i NONE -n -e -s -S "${rescript}" +quit "${DATA}" )
 
echo "${cmd[@]}" >&2
 
tmpfile="/tmp/,,tmp.$$"
for i in {1..5}; do
\time -f '%e' -ao "${tmpfile}" "${cmd[@]}" &>/dev/null
echo -n . >&2
done
echo >&2
 
result=$( awk '{ sum += $1 } END { printf "%.2f", sum / 5 }' "${tmpfile}" )
 
rm -f "${tmpfile}"
 
echo "${result}"
re-date.vim
VimL
1
g/\%([0-9][0-9]\=\)\/\%([0-9][0-9]\=\)\/\%([0-9][0-9]\%([0-9][0-9]\)\=\)/p
re-email.vim
VimL
1
g/\%([^ @]\+\)@\%([^ @]\+\)/p
re-uri.vim
VimL
1
g/\%([a-zA-Z][a-zA-Z0-9]*\):\/\/\%([^ /]\+\)\%(\/[^ ]*\)\=/p
re-uriemail.vim
VimL
1
g/\%([a-zA-Z][a-zA-Z0-9]*\):\/\/\%([^ /]\+\)\%(\/[^ ]*\)\=\|\%([^ @]\+\)@\%([^ @]\+\)/p
re-word.vim
VimL
1
g/.*SCSI-/p
script.pl
Perl
1 2 3 4 5 6 7 8 9
#!/usr/bin/env perl
use strict;
use warnings;
 
my $reobj = qr/$ARGV[0]/;
 
while (<STDIN>) {
print $_ if /$reobj/;
}
script.py
Python
1 2 3 4 5 6 7 8 9
#!/usr/bin/env python
import re
import sys
 
reobj = re.compile(sys.argv[1])
 
for line in sys.stdin:
if reobj.search(line):
sys.stdout.write(line)
vimrc-1
1 2 3
if exists('&regexpengine')
set regexpengine=1
endif
vimrc-2
1 2 3
if exists('&regexpengine')
set regexpengine=2
endif

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.