Skip to content

Instantly share code, notes, and snippets.

@brianloveswords
Last active December 25, 2015 15:29
Show Gist options
  • Save brianloveswords/6998008 to your computer and use it in GitHub Desktop.
Save brianloveswords/6998008 to your computer and use it in GitHub Desktop.
var http = require('http')
var count = 0
// this emulates hitting the max number of open sockets
http.globalAgent.maxSockets = 1
http.createServer(function (req, res) {
count++
(function (count) {
req.on('close', function () {
console.log('socket timeout for request #%s', count)
})
if (count == 1) {
console.log('dropping request #%s', count)
} else {
console.log('serving request')
res.end(''+Date.now())
}
}(count));
}).on('listening', function () {
var addr = this.address()
console.log(addr.address, addr.port)
}).listen(8000)

OpenBadger server killer: hanging requests

The idea here is that we have some request that does not resolve the socket with a response (i.e., never calls response.end()). When that happens, the request socket will remain open indefinitely. If this is in the browser, there is a hard limit timeout after (if I recall correctly) 60 seconds.

The problem is that with server-to-server API calls, the requesting client will never timeout the connection (unless we do so manually). This means that if we hit ulimit -n number of zombie sockets, the server itself will no longer be able to assign client sockets to new connections. The behavior that's exhibited is that a request is able to come in, but it just hangs out in the waiting pool for an available socket, which will never happen because we have a ton of open sockets and no handles to them.

What we still need to figure out

We don't yet know which route is causing the problem. It's very likely that it's an API call since browser calls will timeout.

Strategy

Chris and I discussed (and Isaac confirmed) that sticking some middleware in front of all the routes with some hard timeout and logging is the best bet. We can define a threshold (say, 45 seconds – according to JP our longest valid request is 22 seconds (which itself is gross and we need to fix)) and say that anything above that threshold is a zombie'd request. When that happens, we can 500 and send something to loggins with a high-alert level indicating which route is trying to hold open sockets. We can also manually close the socket so the server doesn't get knocked out of rotation.

// I complained on twitter and issacs offered this up:
// [16:45:08] <@isaacs> brianloveswords: you could write a middleware that does a setTimeout and on res.end() clears it, and if the timeout fires, logs some shit
function swordlovin(q,s,c){
var t=setTimeout(f,2000); // 2 seconds
s.end = function(o){
return function(){
clearTimeout(t);
return o.apply(this,arguments)
}
};
function f(){
s.statusCode=500;
s.end('failure!');
console.error(q.url)
};
return c()
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment