Start the py-req-die.py REQ "client" first, after starting that, start the py-rep.py script, it will sit and wait for 10 seconds for the py-req-die.py script do its business, which should be way shorter, but what can I say, I am careful.
After the server starts up, the client connects, sends data and then dies before the REP socket server can receive it and handle it. At this point a send on that socket is getting sent to nothingness.
The thing that I expected to happen was that the server would block on s.recv() until a client came back that sent something while the server was expecting it. There seems to be no point in having the REP server do huge calculations or something CPU intensive when the result is simply going to be swallowed up by the ether.
The whole point of REQ/REP sockets is to send a request and receive a reply, if the endpoint for the reply is missing already before the REP receives the request then what is the point? I understand the same situation is possible when the REP socket has received the data and then the client is killed, but at least at that point we have best effort in getting back before the client is killed...
Is this how it is supposed to work? Am I thinking of this in the wrong way?
I want to implement an LRU queue, in older versions of ZeroMQ (at least as I can remember, 2.x, will test as soon as I have time) if a REQ socket went away before the REP socket received it from ZeroMQ then that data would be lost. This is the behavior I expected and wish was the case.
The LRU queue would work as follows:
Worker 1 --REQ--> Server
Worker 2 --REQ--> Server
Worker 3 --REQ--> Server
Server fetches items to process from mail server
Server --REP--> Worker 1
Server --REP--> Worker 2
Server --REP--> Worker 3
Worker 3 --REQ--> Server
Server --REP--> Worker 3
Now, it does have a retry algorithm so for example if Worker 2 died while processing eventually it would be picked up by the next worker to come back to us, however if Worker 1 and Worker 2 die before we even get a chance to call recv() on the REP socket then we have at least two time-outs that need to fire before we get to Worker 3 who then gets the work that was destined for Worker 1/2. This means that at least for the first request in a long time (lets say unstable network with lots of Workers joining and leaving) we may not even process the message if we reach some sort of retry limit before any of our Workers respond.