Skip to content

Instantly share code, notes, and snippets.

@pipcet
Forked from calid/00-preamble.md
Last active August 29, 2015 14:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pipcet/1644cbd05e3300e5cec4 to your computer and use it in GitHub Desktop.
Save pipcet/1644cbd05e3300e5cec4 to your computer and use it in GitHub Desktop.

ØMQ Perl Performance Comparison: FFI vs XS bindings

Comparison of the performance of FFI vs XS zeromq bindings. For FFI the ZMQ::FFI bindings are used, first using FFI::Raw on the backend and then using FFI::Platypus. For XS ZMQ::LibZMQ3 is used.

Comparison is done using the zeromq weather station example, first by timing wuclient.pl using the various implementations, and then by profiling wuserver.pl using Devel::NYTProf. When profiling the server is changed to simply publish 1 million messages and exit.

Weather station example code was lightly optimized (e.g. don't declare vars in loop) and modified to be more consistent.

Additionally, a more direct benchmark and comparison of FFI::Platypus vs XS xsubs is also done.

C and Python implementation results are provided as a baseline for performance.

All the code that was created or modified for these benchmarks is listed at the end (C/Python wuclient/wuserver code can be found in the zmq guide).

Test box

CPU:  Intel Core Quad i7-2600K CPU @ 3.40GHz
Mem:  4GB
OS:   Arch Linux
ZMQ:  4.0.5
Perl: 5.20.1

ZMQ::FFI      = 0.19 (FFI::Raw backend), dev (FFI::Platypus backend)
FFI::Raw      = 0.32
FFI::Platypus = 0.31
ZMQ::LibZMQ3  = 1.19

wuclient.pl Time Comparison

FFI::Raw Implementation

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather station...
Average temperature for zipcode '10001 ' was 21F

real    1m22.818s
user    0m0.070s
sys     0m0.023s

FFI::Platypus Implementation

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather station...
Average temperature for zipcode '10001 ' was 38F

real    0m12.813s
user    0m0.083s
sys     0m0.033s

XS Implementation (ZMQ::LibZMQ3)

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather server...
Average temperature for zipcode '10001 ' was 34F

real    0m10.051s
user    0m0.017s
sys     0m0.010s

C Reference Implementation

$ ./wuserver &
$ time ./wuclient
Collecting updates from weather server...
Average temperature for zipcode '10001 ' was 26F

real    0m2.842s
user    0m0.000s
sys     0m0.023s

Python Reference Implementation

I was initially impressed with the performance of the Python example:

$ python -V
Python 3.4.2
$ python -c 'import zmq; print(zmq.pyzmq_version())'
14.5.0

$ python wuserver.py &
$ time python wuclient.py
Collecting updates from weather server...
Average temperature for zipcode '10001' was 49F

real    0m4.599s
user    0m0.063s
sys     0m0.020s

Wow, that's almost as fast as C! But then I noticed:

# Process 5 updates
total_temp = 0
for update_nbr in range(5)
    ...

So where the C and Perl implementations are processing 100 updates, the Python version only processes 5, or 1/20 as many. What about if we use 100 updates like the other languages?

$ python wuserver.py &
$ time python wuclient.py
Collecting updates from weather server...
Average temperature for zipcode '10001' was 17F

real    1m41.108s
user    0m0.077s
sys     0m0.017s

If nothing else, at least the Perl bindings blow the doors off the Python ones :)

wuserver.pl Hot Spot Comparison (Devel::NYTProf)

FFI::Raw Implementation

$self->_zmq3_ffi->{zmq_send}->($self->_socket, $msg, $length, $flags)
# spent 19.9s making 1000000 calls to FFI::Raw::__ANON__[FFI/Raw.pm:94], avg 20µs/call
# spent 5.72s making 2000000 calls to FFI::Raw::coderef, avg 3µs/call
# spent 2.90s making 1000000 calls to ZMQ::FFI::ZMQ3::Socket::_zmq3_ffi, avg 3µs/call

FFI::Platypus Implementation

zmq_send($socket, $msg, $length, $flags)
# spent 1.33s making 1000000 calls to ZMQ::FFI::ZMQ3::Socket::zmq_send, avg 1µs/call

sub ZMQ::FFI::ZMQ3::Socket::zmq_send; # xsub

XS Implementation (ZMQ::LibZMQ3)

zmq_send($socket, $string, -1);
# spent 1.23s making 1000000 calls to ZMQ::LibZMQ3::zmq_send, avg 1µs/call

sub ZMQ::LibZMQ3::zmq_send; # xsub

Direct xsub Comparison

The weather station example inevitably has layers between sending the messages and the underlying xsub calls. This is fine for comparing the two high level APIs ZMQ::FFI vs ZMQ::LibZMQ3, but we also want to compare the FFI::Platypus vs XS xsub performance directly.

So as much as possible strip out intervening layers to determine the raw performance of the two.

Benchmark.pm results

$ perl zmq-bench.pl
FFI ZMQ Version: 4.0.5
XS  ZMQ Version: 4.0.5

Benchmark: timing 10000000 iterations of FFI, XS...
       FFI:  4 wallclock secs ( 3.31 usr +  0.01 sys =  3.32 CPU) @ 3012048.19/s (n=10000000)
        XS:  2 wallclock secs ( 2.16 usr +  0.00 sys =  2.16 CPU) @ 4629629.63/s (n=10000000)

         Rate   FFI    XS     C
FFI 3012048/s    --  -35%  -82%
XS  4629630/s   54%    --  -73%
C* 16835017/s  559%  364%    --

*just 'faking' the C results below into the table so it's easy to compare a baseline

$ time zmq-bench-c
C ZMQ Version: 4.0.5

real    0m0.594s
user    0m0.570s
sys     0m0.017s

$ echo '10000000 / 0.594' | bc -lq
16835016.835 # Rate

Devel::NYTProf profiling results

For profiling and timing in the shell below send in a for loop instead of via Benchmark

sub main::zmqffi_send; # xsub
# spent 15.5s within main::zmqffi_send which was called 10000000 times, avg 2µs/call

sub ZMQ::LibZMQ3::zmq_send; # xsub
# spent 15.6s within ZMQ::LibZMQ3::zmq_send which was called 10000000 times, avg 2µs/call

Q: Why does the profiler indicate basically identical performance of the xsubs, but Benchmark reports performance difference?

A: ???

Time in shell

$ time perl zmq-bench.pl
FFI ZMQ Version: 4.0.5

real    0m3.541s
user    0m3.510s
sys     0m0.027s

$ echo '10000000 / 3.541' | bc -lq
2824060.999 # Rate

$ time perl zmq-bench.pl
XS ZMQ Version: 4.0.5

real    0m2.390s
user    0m2.363s
sys     0m0.020s

$ echo '10000000 / 2.390' | bc -lq
4184100.418 # Rate

XS is 48% faster when timing on the shell.

oprofile_data: zmq-bench.pl
operf --callgraph perl ./zmq-bench.pl
call-graph: oprofile_data
opreport --callgraph > call-graph
nytprof.out: zmq-bench.pl
perl -d:NYTProf ./zmq-bench.pl
opannotate.c: oprofile_data
opannotate --source > opannotate.c
bench.txt: zmq-bench.pl
perl ./zmq-bench.pl > bench.txt
#
# Weather update client
# Connects SUB socket to tcp://localhost:5556
# Collects weather updates and finds avg temp in zipcode
#
import sys
import zmq
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
print("Collecting updates from weather server...")
socket.connect("tcp://localhost:5556")
# Subscribe to zipcode, default is NYC, 10001
zip_filter = sys.argv[1] if len(sys.argv) > 1 else "10001"
# Python 2 - ascii bytes to unicode str
if isinstance(zip_filter, bytes):
zip_filter = zip_filter.decode('ascii')
socket.setsockopt_string(zmq.SUBSCRIBE, zip_filter)
# Process 5 updates
total_temp = 0
for update_nbr in range(5):
string = socket.recv_string()
zipcode, temperature, relhumidity = string.split()
total_temp += int(temperature)
print("Average temperature for zipcode '%s' was %dF" % (
zip_filter, total_temp / update_nbr)
)
#
# Weather update server
# Binds PUB socket to tcp://*:5556
# Publishes random weather updates
#
import zmq
from random import randrange
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
while True:
zipcode = randrange(1, 100000)
temperature = randrange(-80, 135)
relhumidity = randrange(10, 60)
socket.send_string("%i %i %i" % (zipcode, temperature, relhumidity))
#
# should run for about 50 seconds.
#
use strict;
use warnings;
use v5.10;
use ZMQ::LibZMQ3;
use ZMQ::FFI::Constants qw(:all);
my $rv;
my $xs_ctx = zmq_ctx_new();
die 'xs ctx error' unless $xs_ctx;
my $xs_socket = zmq_socket($xs_ctx, ZMQ_PUB);
die 'xs socket error' unless $xs_socket;
$rv = zmq_bind($xs_socket, "ipc:///tmp/zmq-xs-bench-$$");
die 'xs bind error' if $rv == -1;
my $i;
while(1) {
$i++;
zmq_send($xs_socket, 'ohhai', 5, 0);
exit if $i == 100_000_000;
}
#include <zmq.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <assert.h>
#include <string.h>
int main(void)
{
void *ctx = zmq_ctx_new();
assert(ctx);
void *socket = zmq_socket(ctx, ZMQ_PUB);
assert(socket);
pid_t p = getpid();
char *endpoint = malloc(256);
sprintf(endpoint, "ipc:///tmp/zmq-c-bench-%d", p);
assert( -1 != zmq_bind(socket, endpoint) );
int major, minor, patch;
zmq_version(&major, &minor, &patch);
printf("C ZMQ Version: %d.%d.%d\n", major, minor, patch);
for ( int i = 0; i < (10 * 1000 * 1000); i++ ) {
assert( -1 != zmq_send(socket, "ohhai", 5, 0) );
}
}
#
# Directly compare FFI::Platypus vs XS xsubs
#
use strict;
use warnings;
use v5.10;
use FFI::Platypus::Declare;
use ZMQ::LibZMQ3;
use ZMQ::FFI::Constants qw(:all);
use Benchmark qw(:all);
lib 'libzmq.so';
attach(
['zmq_ctx_new' => 'zmqffi_ctx_new']
=> [] => 'pointer'
);
attach(
['zmq_socket' => 'zmqffi_socket']
=> ['pointer', 'int'] => 'pointer'
);
attach(
['zmq_bind' => 'zmqffi_bind']
=> ['pointer', 'string'] => 'int'
);
package FFIsock;
use ZMQ::LibZMQ3;
sub new {
return bless [], $_[0];
}
my $ffi = FFI::Platypus->new;
my $sockobj = FFIsock->new;
$ffi->lib('libzmq.so');
$ffi->attach_method([$ffi],
['zmq_send' => 'ffi']
=> ['pointer', 'string', 'size_t', 'int'] => 'int'
);
$ffi->attach_method(['FFIsock'], ['zmq_send' => 'ffi2']
=> ['pointer', 'string', 'size_t', 'int'] => 'int');
my $ffi_ctx = main::zmqffi_ctx_new();
die 'ffi ctx error' unless $ffi_ctx;
use ZMQ::FFI::Constants qw(:all);
my $ffi_socket = main::zmqffi_socket($ffi_ctx, ZMQ_PUB);
die 'ffi socket error' unless $ffi_socket;
$ffi->attach_method([$sockobj=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
package main;
attach(
['zmq_send' => 'ffi2']
=> ['pointer', 'string', 'size_t', 'int'] => 'int',
);
attach(
['zmq_version' => 'zmqffi_version']
=> ['int*', 'int*', 'int*'] => 'void'
);
$ffi->attach_method([$sockobj=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
my $ffi_hash = { socket => $ffi_socket };
my $rv;
$rv = zmqffi_bind($ffi_socket, "ipc:///tmp/zmq-ffi-bench-$$");
die 'ffi bind error' if $rv == -1;
my $xs_ctx = zmq_ctx_new();
die 'xs ctx error' unless $xs_ctx;
my $xs_socket = zmq_socket($xs_ctx, ZMQ_PUB);
die 'xs socket error' unless $xs_socket;
$rv = zmq_bind($xs_socket, "ipc:///tmp/zmq-xs-bench-$$");
die 'xs bind error' if $rv == -1;
my ($major, $minor, $patch);
zmqffi_version(\$major, \$minor, \$patch);
say "FFI ZMQ Version: " . join(".", $major, $minor, $patch);
say "XS ZMQ Version: " . join(".", ZMQ::LibZMQ3::zmq_version());
use bytes;
use Inline C => qq{
typedef int (*send_t)(void *, const char *, long, int);
void loop_Inline(void *send, void *socket, const char *data, long size, int flags, void *die)
{
send_t s = send;
void (*d)(void) = die;
int i;
for(i=0; i<100*1000*1000; i++) {
if(s(socket, data, size, flags) == 1)
d();
}
}
};
use FFI::TinyCC;
my $tcc = FFI::TinyCC->new;
$tcc->compile_string(q{
void
loop(int (*f)(void *, const char *, long, int), void *arg0, const char *arg1, long arg2, int arg3, void (*die)(void))
{
int i;
for(i=0; i<100*1000*1000; i++)
if(f(arg0, arg1, arg2, arg3) == -1)
die();
}
});
my $address = $tcc->get_symbol('loop');
lib 'libzmq.so';
my $zmqsend = sub { FFI::Platypus::Declare::_ffi_object }->()->find_symbol('zmq_send');
type('(opaque, string, long, int)->int', 'f_closure');
type('()->void', 'die_closure');
attach([$address => 'loop'] => [qw(f_closure opaque string long int die_closure)] => 'void');
my $r3 = timethese 1, {
TinyCC => sub {
my $die_closure = closure { die "zmq_send error"};
loop($zmqsend, $ffi_socket, 'ohhai', 5, 0, $die_closure);
},
Inline => sub {
my $die_closure = closure { die "zmq_send error" };
loop_Inline($zmqsend, $ffi_socket, 'ohhai', 5, 0, $die_closure);
},
Python => sub {
# this is a little unfair, since there's overhead for starting
# python and waiting for it, but that's on the order of a tenth of
# a second ...
system("python ./zmq-bench.py");
},
'Perl exec' => sub {
system("perl ./zmq-bench-selfcontained.pl");
}
};
my $r = timethese 10_000_000, {
'class method' => sub {
die 'ffi send error' if -1 == FFIsock->ffi2($ffi_socket, 'ohhai', 5, 0);
},
'class method(hash)' => sub {
die 'ffi send error' if -1 == FFIsock->ffi2($ffi_hash->{socket}, 'ohhai', 5, 0);
},
'method' => sub {
die 'ffi send error' if -1 == $sockobj->ffio('ohhai', 5, 0);
},
'xsub' => sub {
die 'ffi send error' if -1 == ffi2($ffi_socket, 'ohhai', 5, 0);
},
'xsub(hash)' => sub {
die 'ffi send error' if -1 == ffi2($ffi_hash->{socket}, 'ohhai', 5, 0);
},
'XS' => sub {
die 'xs send error ' if -1 == zmq_send($xs_socket, 'ohhai', 5, 0);
},
};
for my $key (keys %$r3)
{
$r->{$key} = $r3->{$key};
# HACK! we're accessing the Benchmark object's internal struct
$r->{$key}->[5] = 100_000_000;
}
cmpthese($r);
#
# Weather update client
# Connects SUB socket to tcp://localhost:5556
# Collects weather updates and finds avg temp in zipcode
#
import sys
import zmq
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind('ipc:///tmp/zmq-py-bench')
buf = bytes('ohhai')
i=0
while True:
i+=1
socket.send(buf, 0)
if i == 100*1000*1000:
exit(0)
use strict;
use warnings;
use v5.10;
use ZMQ::FFI;
use ZMQ::FFI::Constants qw(ZMQ_SUB);
say "Collecting updates from weather station...";
my $context = ZMQ::FFI->new();
my $subscriber = $context->socket(ZMQ_SUB);
$subscriber->connect("tcp://localhost:5556");
my $filter = $ARGV[0] // "10001 ";
$subscriber->subscribe($filter);
my $update_nbr = 100;
my $total_temp = 0;
my ($string, $zipcode, $temperature, $relhumidity);
for (1..$update_nbr) {
$string = $subscriber->recv();
($zipcode, $temperature, $relhumidity) = split ' ', $string;
$total_temp += $temperature;
}
printf "Average temperature for zipcode '%s' was %dF\n",
$filter, int($total_temp / $update_nbr);
use strict;
use warnings;
use ZMQ::FFI;
use ZMQ::FFI::Constants qw(ZMQ_PUB);
my $context = ZMQ::FFI->new();
my $publisher = $context->socket(ZMQ_PUB);
$publisher->bind("tcp://*:5556");
my ($zipcode, $temperature, $relhumidity, $update);
# for (1..1_000_000) { # publish constant number when profiling
while (1) {
$zipcode = rand(100_000);
$temperature = rand(215) - 80;
$relhumidity = rand(50) + 10;
$update = sprintf(
'%05d %d %d',
$zipcode,$temperature,$relhumidity
);
$publisher->send($update);
}
use strict;
use warnings;
use v5.10;
use ZMQ::LibZMQ3;
use ZMQ::Constants qw(ZMQ_SUB ZMQ_SUBSCRIBE);
use zhelpers;
say 'Collecting updates from weather server...';
my $context = zmq_init();
my $subscriber = zmq_socket($context, ZMQ_SUB);
zmq_connect($subscriber, 'tcp://localhost:5556');
my $filter = @ARGV ? $ARGV[0] : '10001 ';
zmq_setsockopt($subscriber, ZMQ_SUBSCRIBE, $filter);
my $update_nbr = 100;
my $total_temp = 0;
my ($string, $zipcode, $temperature, $relhumidity);
for (1 .. $update_nbr) {
$string = s_recv($subscriber);
($zipcode, $temperature, $relhumidity) = split ' ', $string;
$total_temp += $temperature;
}
printf "Average temperature for zipcode '%s' was %dF\n",
$filter, int($total_temp / $update_nbr);
use strict;
use warnings;
use ZMQ::LibZMQ3;
use ZMQ::Constants qw(ZMQ_PUB);
use zhelpers;
my $context = zmq_init();
my $publisher = zmq_socket($context, ZMQ_PUB);
zmq_bind($publisher, 'tcp://*:5556');
my ($zipcode, $temperature, $relhumidity, $update);
# for (1..1_000_000) { # publish constant number when profiling
while (1) {
$zipcode = rand(100_000);
$temperature = rand(215) - 80;
$relhumidity = rand(50) + 10;
$update = sprintf(
'%05d %d %d',
$zipcode,$temperature,$relhumidity
);
s_send($publisher, $update);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment