erkie/Load Balancing FTP.txt

## README.md

      
    Raw
  

              README.md
            
          
    Original article is 404: http://ben.timby.com/?page_id=210
Retrieved from: https://web.archive.org/web/20130404121626/http://ben.timby.com/?page_id=210

  
## Load Balancing FTP.txt
Load Balancing FTP.
If you run an FTP server at scale, you will eventually want to load balance it. This is no mean task as FTP is a notoriously finicky protocol. Those familiar with FTP will know that it uses more than one TCP connection; the first connection is the command channel and the second is the data channel. To successfully load balance FTP, you must address both of these connections.

To further complicate matters, the data channel can be established using two methods. FTP Active or FTP Passive. For the rest of this document, I will simply use the terms active and passive to refer to these modes. First, let’s review how the command and data channels are used in FTP.

Active FTP.
When using FTP in active mode, the FTP client first connects to the server on port 21. This opens the command channel. The client authenticates itself, sets options, retrieves feature support from the server etc. The data channel is not opened until the client makes request that will result in the transfer of data from the server. Commands that initiate a data channel are file transfers as well as directory listings. Before issuing such a command, the client will issue a PORT command and provide the server with the client’s IP address and a port to use for connecting to it. The client then waits for a connection, and receives the command result over the newly established data channel.

> PORT 1,1,1,1,7,226
In the above command, the IP address is 1.1.1.1, the port is encodes as follows, the first octet (7) is multiplied by 256, then added to the second octet (226) yeilding a port number of 2018. The server will open a connection to 1.1.1.1:2018. The specific port range used for active connections is controled by the FTP client.

Active FTP modes suffers from many problems imposed by NAT routers and firewalls. The client-side must allow connections from the server and further ensure that they are routed to the appropriate LAN device.

Many network gateway devices and software have special FTP modes that monitor FTP command channels, looking for PORT commands, adding temporary firewall rules that allow the imminent connection from the server. These modes only work with plain FTP and are stymied by the use of FTPS, which encrypts the command channel, obscuring it from the gateway’s inspection.

Passive FTP.
For passive mode, the client establishes the data channel connection with the server. This is traditionally how Internet protocols operate. This mode places no special requirements onto the client-side except that it is able to establish outbound connections to the passive port range in use on the server. For a passive connection, the client issues a PASV command, to which the server replies with the connection details.

> PASV
< 227 Entering passive mode (2,2,2,2,30,55).
In the above sequence, the client requests a passive connection and the server provides the IP address and port number to connect to. As with active mode, the IP address is 2.2.2.2 and the port is encoded as 30 * 256 + 55 = 7735. At this point, the FTP client will open a connection to 2.2.2.2:7735. Unlike active mode, the server controls the port range that is used for passive connections.

Passive move generally *just works* as long as it is supported by the server. Therefore it is usually the preferred mode of operation. Exceptions to this are legacy FTP clients that don’t support FTP Passive mode.

Load Balancing.
To load balance the FTP protocol, you must handle not only the command channel, which can be easily accomplished using a variety of techniques but also active and passive data connections.

Command channel.
Let’s stipulate that to load balance the command channel, we will use HAProxy. HAProxy can easily bind port 21 and distribute inbound connections amongst a pool of backend FTP servers. That was easy. A configuration for such a set up would look like the following:

/etc/haproxy/haproxy.cfg:

listen ftp-lb00
    bind 2.2.2.2:21
    mode tcp
    option tcplog
    balance leastconn
    server ftp-serv00 192.168.1.1:21 check
    server ftp-serv01 192.168.1.2:21 check
    server ftp-serv02 192.168.1.3:21 check
The above has three servers, new connections will be directed to the server with the least connections. The public IP address is 2.2.2.2, and connections are made to the 192.168.1.0/24 network. The machine running haproxy is multi-homed, and acts as a router/proxy for these two networks. You must set up NAT forwarding on this machine in iptables. Each of the backend FTP servers must use this machine as their default gateway. The NAT/gateway requirements are not dictated by HAProxy, but for our later work for active/passive data channel connections.

Active Data Channel.
The active data channel is fairly easy to handle. In this mode, the connection is made from the backend server to the client. All that is needed is a NAT rule to perform SNAT from each backend server address to the public address of the server. This is important because at least some (maybe all) FTP clients expect the active data channel connection to come from the same address as the command channel. The following rules on the load balancer would be sufficient.

iptables -A POSTROUTING -s 192.168.1.1/32 -o eth1 -j SNAT --to-source 2.2.2.2
iptables -A POSTROUTING -s 192.168.1.2/32 -o eth1 -j SNAT --to-source 2.2.2.2
iptables -A POSTROUTING -s 192.168.1.3/32 -o eth1 -j SNAT --to-source 2.2.2.2
This is all that is needed server-side for active mode. However, most clients will be unable to use active mode because of a local gateway. For the legacy clients that must use active mode, the onus is on them to configure their network to allow these connections from the server to the client.

Passive Data Channel.
The passive data channel is a bit harder to handle server-side. However, passive mode places very few restrictions on the client-side network gateway. In our scenario, we have three actual FTP servers. Any of the backend servers might need to establish a passive connection with a client. This situation can be handled using iptables/NAT as well as some FTP configuration. First, you must configure your backend FTP servers such that:

The servers all use the public IP address in the reply to PASV commands (2.2.2.2).
The servers each use a unique port range for passive connections.
As an example, ProFTPd can be configured as required using the MasqueradeAddress and PassivePorts directives.

With the above in place, you then need only ensure that the router performs DNAT and forwarding for the passive connections to the appropriate backend FTP server. The following table will illustrate the passive port ranges we assign to each backend.

Server	PASV Port Range
192.168.1.1	1025 – 2048
192.168.1.2	2049 – 3072
192.168.1.3	3073 – 4096
Continuing with our ProFTPd example, the following configuration directives would be part of each server’s configuration.

Server A.

Bind 192.168.1.1
Port 21
MasqueradeAddress 2.2.2.2
PassivePorts 1025 2048
Server B.

Bind 192.168.1.2
Port 21
MasqueradeAddress 2.2.2.2
PassivePorts 2049 3072
Server C.

Bind 192.168.1.3
Port 21
MasqueradeAddress 2.2.2.2
PassivePorts 3073 4096
The following iptables commands on the load balancer would set up the necessary DNAT for these passive ranges.

iptables -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 1025:2048 -j DNAT --to-destination 192.168.1.1
iptables -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 2049:3072 -j DNAT --to-destination 192.168.1.2
iptables -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 3073:4096 -j DNAT --to-destination 192.168.1.3
Client IP Address.
With the above in place, you will have FTP load balanced. Active and Passive mode will both work, and all is right with the world (or at least your FTP cluster). However, there is still one problem to solve. HAProxy is a proxy server, as such, command channel connections will be established from the load balancer to the backend servers. Thus the IP address in your FTP server logs will be the load balancer IP address and not the actual client IP address. To resolve this issue, you will need to use the PROXY protocol to hand off the client IP to each backend server. This is a special extension to haproxy available in version 1.5. With this available, you can configure it by adding “send-proxy” to each server line in the haproxy configuration. Our previous example then becomes…

/etc/haproxy/haproxy.cfg:

listen ftp-lb00
    bind 2.2.2.2:21
    mode tcp
    option tcplog
    balance leastconn
    server ftp-serv00 192.168.1.1:21 send-proxy check
    server ftp-serv01 192.168.1.2:21 send-proxy check
    server ftp-serv02 192.168.1.3:21 send-proxy check
This will likely require a patch to your FTP server to accept this extension. The other option is to use haproxy transparent mode, or a different load balancer such as LVS that relies on routing rather than proxying. I prefer HAProxy with the PROXY protocol, as that is a fully user-space solution (minus the NAT rules) providing a simple, proven and robust solution.
	Load Balancing FTP.
	If you run an FTP server at scale, you will eventually want to load balance it. This is no mean task as FTP is a notoriously finicky protocol. Those familiar with FTP will know that it uses more than one TCP connection; the first connection is the command channel and the second is the data channel. To successfully load balance FTP, you must address both of these connections.

	To further complicate matters, the data channel can be established using two methods. FTP Active or FTP Passive. For the rest of this document, I will simply use the terms active and passive to refer to these modes. First, let’s review how the command and data channels are used in FTP.

	Active FTP.
	When using FTP in active mode, the FTP client first connects to the server on port 21. This opens the command channel. The client authenticates itself, sets options, retrieves feature support from the server etc. The data channel is not opened until the client makes request that will result in the transfer of data from the server. Commands that initiate a data channel are file transfers as well as directory listings. Before issuing such a command, the client will issue a PORT command and provide the server with the client’s IP address and a port to use for connecting to it. The client then waits for a connection, and receives the command result over the newly established data channel.

	> PORT 1,1,1,1,7,226
	In the above command, the IP address is 1.1.1.1, the port is encodes as follows, the first octet (7) is multiplied by 256, then added to the second octet (226) yeilding a port number of 2018. The server will open a connection to 1.1.1.1:2018. The specific port range used for active connections is controled by the FTP client.

	Active FTP modes suffers from many problems imposed by NAT routers and firewalls. The client-side must allow connections from the server and further ensure that they are routed to the appropriate LAN device.

	Many network gateway devices and software have special FTP modes that monitor FTP command channels, looking for PORT commands, adding temporary firewall rules that allow the imminent connection from the server. These modes only work with plain FTP and are stymied by the use of FTPS, which encrypts the command channel, obscuring it from the gateway’s inspection.

	Passive FTP.
	For passive mode, the client establishes the data channel connection with the server. This is traditionally how Internet protocols operate. This mode places no special requirements onto the client-side except that it is able to establish outbound connections to the passive port range in use on the server. For a passive connection, the client issues a PASV command, to which the server replies with the connection details.

	> PASV
	< 227 Entering passive mode (2,2,2,2,30,55).
	In the above sequence, the client requests a passive connection and the server provides the IP address and port number to connect to. As with active mode, the IP address is 2.2.2.2 and the port is encoded as 30 * 256 + 55 = 7735. At this point, the FTP client will open a connection to 2.2.2.2:7735. Unlike active mode, the server controls the port range that is used for passive connections.

	Passive move generally just works as long as it is supported by the server. Therefore it is usually the preferred mode of operation. Exceptions to this are legacy FTP clients that don’t support FTP Passive mode.

	Load Balancing.
	To load balance the FTP protocol, you must handle not only the command channel, which can be easily accomplished using a variety of techniques but also active and passive data connections.

	Command channel.
	Let’s stipulate that to load balance the command channel, we will use HAProxy. HAProxy can easily bind port 21 and distribute inbound connections amongst a pool of backend FTP servers. That was easy. A configuration for such a set up would look like the following:

	/etc/haproxy/haproxy.cfg:

	listen ftp-lb00
	bind 2.2.2.2:21
	mode tcp
	option tcplog
	balance leastconn
	server ftp-serv00 192.168.1.1:21 check
	server ftp-serv01 192.168.1.2:21 check
	server ftp-serv02 192.168.1.3:21 check
	The above has three servers, new connections will be directed to the server with the least connections. The public IP address is 2.2.2.2, and connections are made to the 192.168.1.0/24 network. The machine running haproxy is multi-homed, and acts as a router/proxy for these two networks. You must set up NAT forwarding on this machine in iptables. Each of the backend FTP servers must use this machine as their default gateway. The NAT/gateway requirements are not dictated by HAProxy, but for our later work for active/passive data channel connections.

	Active Data Channel.
	The active data channel is fairly easy to handle. In this mode, the connection is made from the backend server to the client. All that is needed is a NAT rule to perform SNAT from each backend server address to the public address of the server. This is important because at least some (maybe all) FTP clients expect the active data channel connection to come from the same address as the command channel. The following rules on the load balancer would be sufficient.

	iptables -A POSTROUTING -s 192.168.1.1/32 -o eth1 -j SNAT --to-source 2.2.2.2
	iptables -A POSTROUTING -s 192.168.1.2/32 -o eth1 -j SNAT --to-source 2.2.2.2
	iptables -A POSTROUTING -s 192.168.1.3/32 -o eth1 -j SNAT --to-source 2.2.2.2
	This is all that is needed server-side for active mode. However, most clients will be unable to use active mode because of a local gateway. For the legacy clients that must use active mode, the onus is on them to configure their network to allow these connections from the server to the client.

	Passive Data Channel.
	The passive data channel is a bit harder to handle server-side. However, passive mode places very few restrictions on the client-side network gateway. In our scenario, we have three actual FTP servers. Any of the backend servers might need to establish a passive connection with a client. This situation can be handled using iptables/NAT as well as some FTP configuration. First, you must configure your backend FTP servers such that:

	The servers all use the public IP address in the reply to PASV commands (2.2.2.2).
	The servers each use a unique port range for passive connections.
	As an example, ProFTPd can be configured as required using the MasqueradeAddress and PassivePorts directives.

	With the above in place, you then need only ensure that the router performs DNAT and forwarding for the passive connections to the appropriate backend FTP server. The following table will illustrate the passive port ranges we assign to each backend.

	Server PASV Port Range
	192.168.1.1 1025 – 2048
	192.168.1.2 2049 – 3072
	192.168.1.3 3073 – 4096
	Continuing with our ProFTPd example, the following configuration directives would be part of each server’s configuration.

	Server A.

	Bind 192.168.1.1
	Port 21
	MasqueradeAddress 2.2.2.2
	PassivePorts 1025 2048
	Server B.

	Bind 192.168.1.2
	Port 21
	MasqueradeAddress 2.2.2.2
	PassivePorts 2049 3072
	Server C.

	Bind 192.168.1.3
	Port 21
	MasqueradeAddress 2.2.2.2
	PassivePorts 3073 4096
	The following iptables commands on the load balancer would set up the necessary DNAT for these passive ranges.

	iptables -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 1025:2048 -j DNAT --to-destination 192.168.1.1
	iptables -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 2049:3072 -j DNAT --to-destination 192.168.1.2
	iptables -A PREROUTING -d 2.2.2.2/32 -i eth1 -p tcp -m tcp --dport 3073:4096 -j DNAT --to-destination 192.168.1.3
	Client IP Address.
	With the above in place, you will have FTP load balanced. Active and Passive mode will both work, and all is right with the world (or at least your FTP cluster). However, there is still one problem to solve. HAProxy is a proxy server, as such, command channel connections will be established from the load balancer to the backend servers. Thus the IP address in your FTP server logs will be the load balancer IP address and not the actual client IP address. To resolve this issue, you will need to use the PROXY protocol to hand off the client IP to each backend server. This is a special extension to haproxy available in version 1.5. With this available, you can configure it by adding “send-proxy” to each server line in the haproxy configuration. Our previous example then becomes…

	/etc/haproxy/haproxy.cfg:

	listen ftp-lb00
	bind 2.2.2.2:21
	mode tcp
	option tcplog
	balance leastconn
	server ftp-serv00 192.168.1.1:21 send-proxy check
	server ftp-serv01 192.168.1.2:21 send-proxy check
	server ftp-serv02 192.168.1.3:21 send-proxy check
	This will likely require a patch to your FTP server to accept this extension. The other option is to use haproxy transparent mode, or a different load balancer such as LVS that relies on routing rather than proxying. I prefer HAProxy with the PROXY protocol, as that is a fully user-space solution (minus the NAT rules) providing a simple, proven and robust solution.