Skip to content

Instantly share code, notes, and snippets.

@EdJ
Created December 3, 2013 11:56
Show Gist options
  • Save EdJ/7767970 to your computer and use it in GitHub Desktop.
Save EdJ/7767970 to your computer and use it in GitHub Desktop.
Example of an inability to have a unified zero-downtime deployment method when using node.js's cluster module with non-net servers. This causes an issue when using a PULL-based server, for example, when using a RabbitMQ component service. Obviously there are ways around this problem, but a unified method would be nice.
var Application = function Application() {
this.value = Math.random().toFixed(2);
console.log('Worker switched to value ' + this.value);
this.handleRequest = function (r, rs) {
console.log('Worker returning value ' + this.value);
rs.end(this.value.toString());
};
};
var application = new Application();
var http = require('http');
var server = http.createServer(application.handleRequest.bind(application));
// Starting a server causes process.emit('listening').
// As the server really resides within the master, this is emitted
// within the cluster master, however, it is emitted on the cluster worker.
server.listen(1234);
var cluster = require('cluster');
var workers = [];
var zdd = function (oldWorker) {
console.log(oldWorker.id + ' being replaced');
// fork a new worker
var newWorker = cluster.fork();
var timeout;
// when the worker comes online..
newWorker.on('listening', function () {
//console.log(newWorker.id + ' listening');
// ..setup a killswitch for the old listener..
timeout = setTimeout(function () {
oldWorker.kill();
}, 2000);
// ..setup a logging trigger for the old worker..
oldWorker.once('disconnect', function () {
console.log(oldWorker.id + ' disconnected');
clearTimeout(timeout);
});
// ..then disconnect the old worker.
oldWorker.disconnect();
});
return newWorker;
};
// Assume some external trigger for zdd, not just an interval.
var rePopulate = function () {
// delete the cached module, so we can reload the app
delete require.cache[require.resolve('./app')];
for (var i = workers.length; i--;) {
var newWorker = zdd(workers[i]);
workers[i] = newWorker;
}
setTimeout(rePopulate, 5000);
};
setTimeout(rePopulate, 5000);
if (cluster.isMaster) {
// Spawn some workers.
for (var i = 2; i--;) {
workers.push(cluster.fork());
}
} else {
// delete the cached module, so we can reload the app
delete require.cache[require.resolve('./app')];
// Load an external application, unaware of its own clustering.
if (!process.env.PULL_BASED) {
// This will work properly.
require('./app');
} else {
// n.b. this is non-working code.
require('./non-http-app');
}
}
var socket = new require('net').Socket();
var timer;
// A pretend PULL-based server.
socket.connect(1234, 'some host or other', function () {
// At this point, I would like to raise the 'listening' event on the worker, but I cannot.
// Unfortunately the 'listening' event is actually emitted within the master, and it's
// simply not possible to emit it. This makes it impossible to emulate a net server.
socket.on('data', function () {
// do stuff.
console.log('incoming data');
});
timer = setInterval(function () {
socket.write('getFragment');
}, 3000);
});
// This would be nice. Unfortunately there's actually an "Internal event"
// harded coded into the cluster source, prefixed with NODE_CLUSTER_.
// I don't have the time to get to the bottom of how to bind to the worker.disconnect
// event again, but suffice to say it was possible (with some fudging) last time I
// looked at this issue. That'll teach me not to backup my experiments :)
process.on('disconnect', function () {
console.log('disconnect');
clearInterval(timer);
socket.end();
});
@sam-github
Copy link

A couple comments:

  • you should use https://github.com/strongloop/strong-cluster-control, it wraps the restart for you
  • There is a disconnect in worker, you get it AFTER the ipc channel disconnects, you can use it to close your net client socket
  • reread the node cluster docs, I've added docs about the messages, and when they happen
  • hooking the disconnect cmd to worker, see https://gist.github.com/sam-github/8185222, I think it should be a documented API, but it isn't right now. see strong-cluster-control's worker.shutdown(), it sends such a msg for you
  • when you say "pull based server", what you mean is "tcp client"... clients don't listen. there is no listen, in worker or in master. after you get a connection in your worker and you know worker is alive, send a custom msg to your master: process.send('ONLINE'), and use that
  • your delete of require.cache is baffling... what do you think its doing? ./app is required only once in worker, if at all (because of ENV), and it happens after the cache delete...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment