Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
why we use IRC nodes

Why is IRC distributed across multiple servers?

I have been wondering for a long time why IRC networks have multiple servers. Wouldn't it be simpler just to use a single server?

One of the problems of having multiple servers is that netsplits can occur. Anybody who has been on IRC for a while will have witnessed one. Hundreds of people suddenly ripped out of the chat. This can also screw up channel and user modes, and 'some people' have been known to wait for netsplits in order to takeover channels or enter password protected channels.

So lets compare situation (A) a single IRC server everyone connects to with the current setup people use (B) multiple servers. Let's say you run an IRC network with u = 40,000 users and n = 20 server nodes that people connect to via round robin DNS (meaning that when people resolve the DNS it gives them a random server from the set of 20 to connect to). These are vaguely realistic numbers modelled after libera.chat.

So in (B) you have roughly u/n = 2000 clients connected to each server node, and each server node also connects to some number 1 <= b <= n-1 of the other servers. Maybe a typical server is connected to b = 3 other servers in the network. They need to do this to so that messages from clients connected to one server reach clients connected to other servers.

We can analyze the usage of 3 resources: bandwidth, RAM and CPU.

Bandwidth

Imagine 10% of the clients send a message at one instant in time. What happens?

A

  • Input: The server recieves 4,000 messages from 40,000 different TCP connections.
  • Output: The server must send every message it gets to every other user: 4000 * (u - 1) = 159,996,000 messages sent out across all 40,000 connections.
  • RAM/CPU: The server software has to maintain 40,000 connections, parsing data from them.

B

  • Input: The server recieves a total of 4,000 messages as before, but it comes in as lots of small messages from its 2000 clients and in large packages from the b = 3 other servers.
  • Output: The output is quite different. The server needs to relay the 400 messages it got to the 3,999 clients that didn't send it, but it also needs to relay these messages out to the b = 3 other servers it's connected to. So it sends out a total of 400 * (3999 + 3) = 1,600,800 messages. This is about 160x less than with a single server.
  • RAM/CPU: The server software has to maintain 2000+3 connections. Far fewer, so this will take up much less RAM and CPU.

Conclusion

So distributing an IRC network across multiple servers means your server does not have to manage as many connections which will reduce RAM and CPU. Will output much less data over the network but will recieve the same amount of data in.

@patrickshafer

This comment has been minimized.

Copy link

@patrickshafer patrickshafer commented Sep 12, 2021

A thought on the 2nd bullet under scenario A: a message from a client doesn't get propagated to all other clients or servers. A privmsg command will need to lookup the intended recipients and route the message accordingly.

For example, a privmsg between two users will only be visible to those users and the servers between them.

A privmsg to a channel will only be visible to the members of the channel and the servers between them- it's also only sent once.

This means that the bandwidth consumption is considerably less as it's rare to have all connected clients in a single channel.

@mattkeenan

This comment has been minimized.

Copy link

@mattkeenan mattkeenan commented Sep 12, 2021

The answer lies in the history of IRC. It was developed in the late 80s and took inspiration from BITNET Relay and USENET. USENET itself grew on top of UUCP from 1979 (which predates TCP/IP and RFC833 email). These systems relied on store and forward as network connections weren't always on. And even in the late 80s intercontinental Round Trip Times (RTTs; aka pings) were frequently above 2 seconds. These network issues meant having multiple servers and a store and forward protocol were a necessity. And being a necessity it was baked into the protocol, and now IRC clients and servers still need to support it.

There is also the side effect that administration of the servers is also distributed; which can be both a good and a bad thing.

@pjz

This comment has been minimized.

Copy link

@pjz pjz commented Sep 12, 2021

This means that the bandwidth consumption is considerably less as it's rare to have all connected clients in a single channel.

When analyzing for efficiency, a simplifying practice is to consider the worst-case scenario. Which, here, would be 'all-in-one-channel'.

I'm surprised the author doesn't mention what I consider the main reason: High Availability aka failover: if one machine dies for whatever reason, the whole service doesn't go down. And your failover isn't sitting around idle and untested, it's actively contributing capacity to the service, which means that the likelihood of your failover not working when needed is reduced or eliminated.

@jbash

This comment has been minimized.

Copy link

@jbash jbash commented Sep 12, 2021

At the time when the IRC protocol was conceived, it would have been thought of as weird and frankly irresponsible to make a major communication network dependent on a central server, and it would have been thought of as weird and irresponsible to put a major communication network under the control of central administration.

This attitude was of course correct.

@wanderingstan

This comment has been minimized.

Copy link

@wanderingstan wanderingstan commented Sep 12, 2021

More than “weird and irresponsible”, it would have been a poor design choice. Many “server connections” were scheduled sessions over dial-up modems. 100% uptime wasn’t a thing. Being distributed wasn’t so much a philosophical choice as a pragmatic choice.

@jbash

This comment has been minimized.

Copy link

@jbash jbash commented Sep 12, 2021

100 percent up time is still not a thing, and avoiding single points of failure IS a pragmatic issue. Always has been.

@chowder

This comment has been minimized.

Copy link

@chowder chowder commented Sep 12, 2021

The output is quite different. The server needs to relay the 400 messages it got to the 3,999 clients that didn't send it

Did you mean 4000 messages here?

@cobratbq

This comment has been minimized.

Copy link

@cobratbq cobratbq commented Sep 12, 2021

I don't think I understand your example for A: out of 40,000 connected clients/users, about 4,000 would send a message at an instant in time. Clients/users would be joined to different channels. Let's say each user is in a channel with 100 other users. (So 101 channel members in total. Convenient number for calculations.) So, 4,000 * 100 = 400,000 messages need to be transmitted to distribute the incoming messages to all their recipients. That's significantly less than 159,996,000. Am I missing something?

@ssg

This comment has been minimized.

Copy link

@ssg ssg commented Sep 12, 2021

Internet back then wasn't as reliable as today. It was usual to lose access to certain parts of it or servers going down for arbitrary reasons. Multi-server architecture helped people keep communicating while the issues were being worked on. Same with FTP mirrors.

@ipv6king

This comment has been minimized.

Copy link

@ipv6king ipv6king commented Sep 12, 2021

Sweet memories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment