calid/00-preamble.md

## 00-preamble.md

      
    Raw
  

              00-preamble.md
            
          
    ØMQ Perl Performance Comparison: FFI vs XS bindings

Comparison of the performance of FFI vs XS zeromq bindings.  For FFI the
ZMQ::FFI bindings are used, first using FFI::Raw on the backend and then
using FFI::Platypus.  For XS ZMQ::LibZMQ3 is used.
Comparison is done using the zeromq weather station example, first by timing
wuclient.pl using the various implementations, and then by profiling
wuserver.pl using Devel::NYTProf.  When profiling the server is changed to
simply publish 1 million messages and exit.
Weather station example code was lightly optimized (e.g. don't declare vars in
loop) and modified to be more consistent.
Additionally, a more direct benchmark and comparison of FFI::Platypus vs XS
xsubs is also done.
C and Python implementation results are provided as a baseline for
performance.
All the code that was created or modified for these benchmarks is listed at
the end (C/Python wuclient/wuserver code can be found in the zmq guide).
Test box

CPU:  Intel Core Quad i7-2600K CPU @ 3.40GHz
Mem:  4GB
OS:   Arch Linux
ZMQ:  4.0.5
Perl: 5.20.1

ZMQ::FFI      = 0.19 (FFI::Raw backend), dev (FFI::Platypus backend)
FFI::Raw      = 0.32
FFI::Platypus = 0.31
ZMQ::LibZMQ3  = 1.19


## 01-wuclient-comparison.md

      
    Raw
  

              01-wuclient-comparison.md
            
          
    wuclient.pl Time Comparison

FFI::Raw Implementation

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather station...
Average temperature for zipcode '10001 ' was 21F

real    1m22.818s
user    0m0.070s
sys     0m0.023s

FFI::Platypus Implementation

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather station...
Average temperature for zipcode '10001 ' was 38F

real    0m12.813s
user    0m0.083s
sys     0m0.033s

XS Implementation (ZMQ::LibZMQ3)

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather server...
Average temperature for zipcode '10001 ' was 34F

real    0m10.051s
user    0m0.017s
sys     0m0.010s

C Reference Implementation

$ ./wuserver &
$ time ./wuclient
Collecting updates from weather server...
Average temperature for zipcode '10001 ' was 26F

real    0m2.842s
user    0m0.000s
sys     0m0.023s

Python Reference Implementation

I was initially impressed with the performance of the Python example:
$ python -V
Python 3.4.2
$ python -c 'import zmq; print(zmq.pyzmq_version())'
14.5.0

$ python wuserver.py &
$ time python wuclient.py
Collecting updates from weather server...
Average temperature for zipcode '10001' was 49F

real    0m4.599s
user    0m0.063s
sys     0m0.020s

Wow, that's almost as fast as C!  But then I noticed:
# Process 5 updates
total_temp = 0
for update_nbr in range(5)
    ...
So where the C and Perl implementations are processing 100 updates, the Python
version only processes 5, or 1/20 as many. What about if we use 100
updates like the other languages?
$ python wuserver.py &
$ time python wuclient.py
Collecting updates from weather server...
Average temperature for zipcode '10001' was 17F

real    1m41.108s
user    0m0.077s
sys     0m0.017s

If nothing else, at least the Perl bindings blow the doors off the Python
ones :)

  
## 02-wuserver-comparison.md

      
    Raw
  

              02-wuserver-comparison.md
            
          
    wuserver.pl Hot Spot Comparison (Devel::NYTProf)

FFI::Raw Implementation

$self->_zmq3_ffi->{zmq_send}->($self->_socket, $msg, $length, $flags)
# spent 19.9s making 1000000 calls to FFI::Raw::__ANON__[FFI/Raw.pm:94], avg 20µs/call
# spent 5.72s making 2000000 calls to FFI::Raw::coderef, avg 3µs/call
# spent 2.90s making 1000000 calls to ZMQ::FFI::ZMQ3::Socket::_zmq3_ffi, avg 3µs/call

FFI::Platypus Implementation

zmq_send($socket, $msg, $length, $flags)
# spent 1.33s making 1000000 calls to ZMQ::FFI::ZMQ3::Socket::zmq_send, avg 1µs/call

sub ZMQ::FFI::ZMQ3::Socket::zmq_send; # xsub

XS Implementation (ZMQ::LibZMQ3)

zmq_send($socket, $string, -1);
# spent 1.23s making 1000000 calls to ZMQ::LibZMQ3::zmq_send, avg 1µs/call

sub ZMQ::LibZMQ3::zmq_send; # xsub


## 03-misc-benchmark-comparison.md

      
    Raw
  

              03-misc-benchmark-comparison.md
            
          
    Direct xsub Comparison

The weather station example inevitably has layers between sending the messages
and the underlying xsub calls. This is fine for comparing the two high level
APIs ZMQ::FFI vs ZMQ::LibZMQ3, but we also want to compare the
FFI::Platypus vs XS xsub performance directly.
So as much as possible strip out intervening layers to determine the raw
performance of the two.
Benchmark.pm results

$ perl zmq-bench.pl
FFI ZMQ Version: 4.0.5
XS  ZMQ Version: 4.0.5

Benchmark: timing 10000000 iterations of FFI, XS...
       FFI:  4 wallclock secs ( 3.31 usr +  0.01 sys =  3.32 CPU) @ 3012048.19/s (n=10000000)
        XS:  2 wallclock secs ( 2.16 usr +  0.00 sys =  2.16 CPU) @ 4629629.63/s (n=10000000)

         Rate   FFI    XS     C
FFI 3012048/s    --  -35%  -82%
XS  4629630/s   54%    --  -73%
C* 16835017/s  559%  364%    --

*just 'faking' the C results into the table so it's easy to compare a baseline
$ time zmq-bench-c
C ZMQ Version: 4.0.5

real    0m0.594s
user    0m0.570s
sys     0m0.017s

$ echo '10000000 / 0.594' | bc -lq
16835016.835 # Rate

Devel::NYTProf profiling results

For profiling and timing in the shell below send in a for loop instead of via
Benchmark
sub main::zmqffi_send; # xsub
# spent 15.5s within main::zmqffi_send which was called 10000000 times, avg 2µs/call

sub ZMQ::LibZMQ3::zmq_send; # xsub
# spent 15.6s within ZMQ::LibZMQ3::zmq_send which was called 10000000 times, avg 2µs/call

Q: Why does the profiler indicate basically identical performance of the xsubs,
but Benchmark reports performance difference?
A: ???
Time in shell

$ time perl zmq-bench.pl
FFI ZMQ Version: 4.0.5

real    0m3.541s
user    0m3.510s
sys     0m0.027s

$ echo '10000000 / 3.541' | bc -lq
2824060.999 # Rate

$ time perl zmq-bench.pl
XS ZMQ Version: 4.0.5

real    0m2.390s
user    0m2.363s
sys     0m0.020s

$ echo '10000000 / 2.390' | bc -lq
4184100.418 # Rate

XS is 48% faster when timing on the shell.

  
## zmq-bench.c
#include <zmq.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <assert.h>
#include <string.h>

int main(void)
{
    void *ctx = zmq_ctx_new();
    assert(ctx);

    void *socket = zmq_socket(ctx, ZMQ_PUB);
    assert(socket);

    pid_t p = getpid();

    char *endpoint = malloc(256);
    sprintf(endpoint, "ipc:///tmp/zmq-c-bench-%d", p);

    assert( -1 != zmq_bind(socket, endpoint) );

    int major, minor, patch;
    zmq_version(&major, &minor, &patch);

    printf("C ZMQ Version: %d.%d.%d\n", major, minor, patch);

    for ( int i = 0; i < (10 * 1000 * 1000); i++ ) {
        assert( -1 != zmq_send(socket, "ohhai", 5, 0) );
    }
}

## zmq-bench.pl
#
# Directly compare FFI::Platypus vs XS xsubs
#

use strict;
use warnings;
use v5.10;

use FFI::Platypus::Declare;
use ZMQ::LibZMQ3;

use ZMQ::FFI::Constants qw(:all);

use Benchmark qw(:all);

lib 'libzmq.so';

attach(
    ['zmq_ctx_new' => 'zmqffi_ctx_new']
        => [] => 'pointer'
);

attach(
    ['zmq_socket' => 'zmqffi_socket']
        => ['pointer', 'int'] => 'pointer'
);

attach(
    ['zmq_bind' => 'zmqffi_bind']
        => ['pointer', 'string'] => 'int'
);

attach(
    ['zmq_send' => 'zmqffi_send']
        => ['pointer', 'string', 'size_t', 'int'] => 'int'
);

attach(
    ['zmq_version' => 'zmqffi_version']
        => ['int*', 'int*', 'int*'] => 'void'
);

my $ffi_ctx = zmqffi_ctx_new();
die 'ffi ctx error' unless $ffi_ctx;

my $ffi_socket = zmqffi_socket($ffi_ctx, ZMQ_PUB);
die 'ffi socket error' unless $ffi_socket;

my $rv;

$rv = zmqffi_bind($ffi_socket, "ipc:///tmp/zmq-ffi-bench-$$");
die 'ffi bind error' if $rv == -1;

my $xs_ctx = zmq_ctx_new();
die 'xs ctx error' unless $xs_ctx;

my $xs_socket = zmq_socket($xs_ctx, ZMQ_PUB);
die 'xs socket error' unless $xs_socket;

$rv = zmq_bind($xs_socket, "ipc:///tmp/zmq-xs-bench-$$");
die 'xs bind error' if $rv == -1;


my ($major, $minor, $patch);
zmqffi_version(\$major, \$minor, \$patch);

say "FFI ZMQ Version: " . join(".", $major, $minor, $patch);
say "XS  ZMQ Version: " . join(".", ZMQ::LibZMQ3::zmq_version());


my $r = timethese 10_000_000, {
    'XS'  => sub {
        die 'xs send error ' if -1 == zmq_send($xs_socket, 'ohhai', 5, 0);
    },

    'FFI' => sub {
        die 'ffi send error' if -1 == zmqffi_send($ffi_socket, 'ohhai', 5, 0);
    },
};

cmpthese($r);


## zmq-ffi-wuclient.pl
use strict;
use warnings;
use v5.10;

use ZMQ::FFI;
use ZMQ::FFI::Constants qw(ZMQ_SUB);

say "Collecting updates from weather station...";

my $context = ZMQ::FFI->new();

my $subscriber = $context->socket(ZMQ_SUB);
$subscriber->connect("tcp://localhost:5556");

my $filter = $ARGV[0] // "10001 ";
$subscriber->subscribe($filter);

my $update_nbr = 100;
my $total_temp = 0;

my ($string, $zipcode, $temperature, $relhumidity);

for (1..$update_nbr) {
    $string = $subscriber->recv();

    ($zipcode, $temperature, $relhumidity) = split ' ', $string;
    $total_temp += $temperature;
}

printf "Average temperature for zipcode '%s' was %dF\n",
    $filter, int($total_temp / $update_nbr);

## zmq-ffi-wuserver.pl
use strict;
use warnings;

use ZMQ::FFI;
use ZMQ::FFI::Constants qw(ZMQ_PUB);

my $context = ZMQ::FFI->new();

my $publisher = $context->socket(ZMQ_PUB);
$publisher->bind("tcp://*:5556");

my ($zipcode, $temperature, $relhumidity, $update);

# for (1..1_000_000) { # publish constant number when profiling
while (1) {
    $zipcode     = rand(100_000);
    $temperature = rand(215) - 80;
    $relhumidity = rand(50) + 10;

    $update = sprintf(
        '%05d %d %d',
        $zipcode,$temperature,$relhumidity
    );

    $publisher->send($update);
}

## zmq-xs-wuclient.pl
use strict;
use warnings;
use v5.10;

use ZMQ::LibZMQ3;
use ZMQ::Constants qw(ZMQ_SUB ZMQ_SUBSCRIBE);
use zhelpers;

say 'Collecting updates from weather server...';

my $context = zmq_init();

my $subscriber = zmq_socket($context, ZMQ_SUB);
zmq_connect($subscriber, 'tcp://localhost:5556');

my $filter = @ARGV ? $ARGV[0] : '10001 ';
zmq_setsockopt($subscriber, ZMQ_SUBSCRIBE, $filter);

my $update_nbr = 100;
my $total_temp = 0;

my ($string, $zipcode, $temperature, $relhumidity);

for (1 .. $update_nbr) {
    $string = s_recv($subscriber);

    ($zipcode, $temperature, $relhumidity) = split ' ', $string;
    $total_temp += $temperature;
}

printf "Average temperature for zipcode '%s' was %dF\n",
    $filter, int($total_temp / $update_nbr);

## zmq-xs-wuserver.pl
use strict;
use warnings;

use ZMQ::LibZMQ3;
use ZMQ::Constants qw(ZMQ_PUB);
use zhelpers;

my $context = zmq_init();

my $publisher = zmq_socket($context, ZMQ_PUB);
zmq_bind($publisher, 'tcp://*:5556');

my ($zipcode, $temperature, $relhumidity, $update);

# for (1..1_000_000) { # publish constant number when profiling
while (1) {
    $zipcode     = rand(100_000);
    $temperature = rand(215) - 80;
    $relhumidity = rand(50) + 10;

    $update = sprintf(
        '%05d %d %d',
        $zipcode,$temperature,$relhumidity
    );

    s_send($publisher, $update);
}
	#include <zmq.h>
	#include <stdlib.h>
	#include <stdio.h>
	#include <unistd.h>
	#include <assert.h>
	#include <string.h>

	int main(void)
	{
	void *ctx = zmq_ctx_new();
	assert(ctx);

	void *socket = zmq_socket(ctx, ZMQ_PUB);
	assert(socket);

	pid_t p = getpid();

	char *endpoint = malloc(256);
	sprintf(endpoint, "ipc:///tmp/zmq-c-bench-%d", p);

	assert( -1 != zmq_bind(socket, endpoint) );

	int major, minor, patch;
	zmq_version(&major, &minor, &patch);

	printf("C ZMQ Version: %d.%d.%d\n", major, minor, patch);

	for ( int i = 0; i < (10 * 1000 * 1000); i++ ) {
	assert( -1 != zmq_send(socket, "ohhai", 5, 0) );
	}
	}
	#
	# Directly compare FFI::Platypus vs XS xsubs
	#

	use strict;
	use warnings;
	use v5.10;

	use FFI::Platypus::Declare;
	use ZMQ::LibZMQ3;

	use ZMQ::FFI::Constants qw(:all);

	use Benchmark qw(:all);

	lib 'libzmq.so';

	attach(
	['zmq_ctx_new' => 'zmqffi_ctx_new']
	=> [] => 'pointer'
	);

	attach(
	['zmq_socket' => 'zmqffi_socket']
	=> ['pointer', 'int'] => 'pointer'
	);

	attach(
	['zmq_bind' => 'zmqffi_bind']
	=> ['pointer', 'string'] => 'int'
	);

	attach(
	['zmq_send' => 'zmqffi_send']
	=> ['pointer', 'string', 'size_t', 'int'] => 'int'
	);

	attach(
	['zmq_version' => 'zmqffi_version']
	=> ['int', 'int', 'int*'] => 'void'
	);

	my $ffi_ctx = zmqffi_ctx_new();
	die 'ffi ctx error' unless $ffi_ctx;

	my $ffi_socket = zmqffi_socket($ffi_ctx, ZMQ_PUB);
	die 'ffi socket error' unless $ffi_socket;

	my $rv;

	$rv = zmqffi_bind($ffi_socket, "ipc:///tmp/zmq-ffi-bench-$$");
	die 'ffi bind error' if $rv == -1;

	my $xs_ctx = zmq_ctx_new();
	die 'xs ctx error' unless $xs_ctx;

	my $xs_socket = zmq_socket($xs_ctx, ZMQ_PUB);
	die 'xs socket error' unless $xs_socket;

	$rv = zmq_bind($xs_socket, "ipc:///tmp/zmq-xs-bench-$$");
	die 'xs bind error' if $rv == -1;


	my ($major, $minor, $patch);
	zmqffi_version(\$major, \$minor, \$patch);

	say "FFI ZMQ Version: " . join(".", $major, $minor, $patch);
	say "XS ZMQ Version: " . join(".", ZMQ::LibZMQ3::zmq_version());


	my $r = timethese 10_000_000, {
	'XS' => sub {
	die 'xs send error ' if -1 == zmq_send($xs_socket, 'ohhai', 5, 0);
	},

	'FFI' => sub {
	die 'ffi send error' if -1 == zmqffi_send($ffi_socket, 'ohhai', 5, 0);
	},
	};

	cmpthese($r);
	use strict;
	use warnings;

	use ZMQ::FFI;
	use ZMQ::FFI::Constants qw(ZMQ_PUB);

	my $context = ZMQ::FFI->new();

	my $publisher = $context->socket(ZMQ_PUB);
	$publisher->bind("tcp://*:5556");

	my ($zipcode, $temperature, $relhumidity, $update);

	# for (1..1_000_000) { # publish constant number when profiling
	while (1) {
	$zipcode = rand(100_000);
	$temperature = rand(215) - 80;
	$relhumidity = rand(50) + 10;

	$update = sprintf(
	'%05d %d %d',
	$zipcode,$temperature,$relhumidity
	);

	$publisher->send($update);
	}