|
#### #linode on oftc |
|
#### 2015-04-29 US/Eastern |
|
11:39:20 <rduplain> anyone know if NodeBalancer passive checks are configurable? |
|
11:39:25 <rduplain> The let-it-crash programming model falls to pieces, because after some arbitrary number of 50x, your node gets removed from rotation. |
|
11:42:29 <akerl> Um... that's the let-it-crash model working |
|
11:43:01 <akerl> and no, they aren't configurable, except in the way that if you switch to TCP mode there are no passive checks |
|
11:44:35 <rduplain> akerl: that's not my experience. passive checks are clearly happening with TCP active mode. |
|
11:44:48 <akerl> that is false |
|
11:45:02 <rduplain> I disagree. |
|
11:45:16 <akerl> in TCP mode (the mode, not the style of active checks), there are no passive checks on 5xx errors because it's not reading the HTTP traffic, because it's just proxying the TCP connections |
|
11:45:31 <rduplain> I see. Thanks for clarifying. |
|
11:45:34 <rduplain> I'm using HTTPS mode. |
|
11:46:37 <MajObviousman> so then how are you passivly scanning responses for 50x? |
|
11:46:46 <akerl> Yes, so you get 5xx passive checks. If your backends are throwing 5xx errors, it should mean they are unhealthy, ergo they get pulled |
|
11:47:14 <rduplain> We send 500 responses to the client, we don't want NodeBalancer to make that assumption for us. |
|
11:47:19 <akerl> MajObviousman: HTTPS mode terminates SSL at the NodeBal, and the NodeBal does passive health checks where if your backends throw 5xx codes back for requests from users, they get pulled |
|
11:47:34 <akerl> rduplain: What's a scenario where you throw a 5xx that doesn't mean "server error" |
|
11:47:51 <MajObviousman> ahhh duh. Yeah sorry, I forgot that was a feature |
|
11:48:02 <MajObviousman> load up that NB! |
|
11:48:57 -*- MajObviousman spent way too long working with load balancers where SSL termination was an expensive add-on feature, and so nobody opted for it |
|
11:49:46 <akerl> I'm only really a fan of SSL termination when it's followed by SSL renegotiation, which doesn't happen here and has pretty bleh performance characteristics at scale |
|
11:50:42 <MajObviousman> so then what's the purpose of the termination if you're just re-encrypting it again to the back-end node? |
|
11:50:59 <MajObviousman> just to look at the contents? |
|
11:51:17 <MajObviousman> you don't have to re-encrypt to do that |
|
11:51:36 <akerl> Mostly load balancing. Also your backend nodes and balancer nodes can trust based on their own happy internal certs rather than the expensive dangerous public cert |
|
11:52:34 <MajObviousman> to each his own, I suppose |
|
11:53:25 -*- MajObviousman personally doesn't mind having the expensive, dangerous public cert in both the LB and back-end nodes, if SSL to the node is mandatory |
|
11:55:43 <jrhunt> akerl, is your objection that keeping the private key to the public cert on *all* the webservers in the pool + the LBs is more dangerous than just having it on the LBs? |
|
11:57:31 <akerl> I wouldn't call it an objection, but yes, keeping a secret in more places is absolutely less secure |
|
11:57:38 <MajObviousman> sure |
|
11:57:59 <MajObviousman> I suspect we are assigning vastly different weight to our risk assessments of that particular item |
|
11:58:01 <akerl> In my case, the backend nodes have 0 access to the internet, and I also don't want to trust the network that is not entirely in my control |
|
11:58:39 <akerl> so everything inside the circle does trust on internal CAs already, and everything outside the circle does trust on the external cert already |
|
11:59:07 <akerl> Thus, having the LB -> backend connection use certs that already exist everywhere they need to be Just Makes Sense |
|
12:02:45 <MajObviousman> it makes sense from a security standpoint, but there's a thought running around in the back of my head shouting, "This won't scale cheaply!" |
|
12:03:30 <akerl> You mean the SSL renegotiation cost? or having to deal with certs for all the backend nodes? |
|
12:04:03 <MajObviousman> no the certs are free |
|
12:04:07 <MajObviousman> but SSL computation is not |
|
12:04:14 <MajObviousman> you're trebling it |
|
12:04:17 <akerl> Yea, that was my initial sadness :) |
|
12:04:30 <akerl> "and has pretty bleh performance characteristics at scale" |
|
12:04:40 <MajObviousman> oh yes, yes you did state that up front |
|
12:04:55 <MajObviousman> again, different criteria in our individual risk assessments :) |
|
12:05:12 <MajObviousman> unrelated topic, coming down to NC this year? |
|
12:05:15 <MajObviousman> or did I already ask you that? |
|
12:05:33 <akerl> I might be. Depends on how crazy the real world is |
|
12:05:55 <akerl> is it at the same place this year? |
|
12:06:00 <MajObviousman> yep |
|
12:08:27 <rduplain> akerl: "server error" != node is unhealthy. in our case, the server error would happen on all nodes. |
|
12:08:39 <rduplain> i.e. some weird state happened |
|
12:09:08 <rduplain> akerl: the reason I moved from TCP mode to HTTPS is that I want real IP. Is there a way to get that with TCP mode? |
|
12:09:14 <akerl> rduplain: I feel like this is a fundamental difference regarding the spec |
|
12:09:33 <akerl> No, there is not a way to get the originating IP in TCP mode |
|
12:09:46 <akerl> Yes, 5xx errors mean the node is unhealhty, because otherwise it wouldn't be throwing errors |
|
12:10:04 <rduplain> errors happen |
|
12:10:11 <rduplain> it's not the node that's unhealthy |
|
12:10:16 <rduplain> so I don't want it removed |
|
12:10:40 <akerl> If the error being thrown means something else ("client gave me a bad method", or "there were no results" or "try again later"), give the right code for that, they exist, all in happy 4xx land |
|
12:10:43 <rduplain> I agree that NodeBalancer wasn't designed for that, but it's surprising to me, since we're not doing anything weird (though clearly you think we are). |
|
12:10:49 <akerl> You are |
|
12:10:55 <rduplain> Haha, okay. |
|
12:11:02 <akerl> 5xx is designed to represent "the server is misbehaving" |
|
12:11:29 <akerl> and, as you cited up front, the idea of let-it-crash is that a misbehaving server should die and be replaced with a non-misbehaving server |
|
12:12:52 <rduplain> the issue here is that NodeBalancer doesn't let me configure it to have my code decide how to replace the misbehaving server |
|
12:13:10 <rduplain> because it's not the server, it's some subsystem of mine |
|
12:13:26 <akerl> That's because nodebalancers implement the balancing part |
|
12:13:31 <rduplain> I got that. |
|
12:13:36 <rduplain> I still want configuration here. |
|
12:13:38 <akerl> You'd handle detecting and acting on failures via the API |
|
12:13:51 <rduplain> I just want to turn off passive checks. |
|
12:14:31 <rduplain> Thanks a lot for the discussion, akerl. This has been useful. |