Skip to content

Instantly share code, notes, and snippets.

@mohanpedala
Last active July 31, 2017 15:23
Show Gist options
  • Save mohanpedala/881efe49c751497cc7bf8e49ac85631b to your computer and use it in GitHub Desktop.
Save mohanpedala/881efe49c751497cc7bf8e49ac85631b to your computer and use it in GitHub Desktop.
Forward Proxies and Reverse Proxies/Gateways Apache

Forward Proxies and Reverse Proxies/Gateways

Apache HTTP Server can be configured in both a forward and reverse proxy (also known as gateway) mode.

mod_proxy and related modules implement a proxy/gateway for Apache HTTP Server, supporting a number of popular protocols as well as several different load balancing algorithms. Third-party modules can add support for additional protocols and load balancing algorithms.

mod_proxy, which provides basic proxy capabilities

Forward Proxy

  1. An ordinary forward proxy is an intermediate server that sits between the client and the origin server.
  2. In order to get content from the origin server, the client sends a request to the proxy naming the origin server as the target.
  3. The proxy then requests the content from the origin server and returns it to the client.
  4. The client must be specially configured to use the forward proxy to access other sites.

Note: A typical usage of a forward proxy is to provide Internet access to internal clients that are otherwise restricted by a firewall. The forward proxy can also use caching (as provided by mod_cache) to reduce network usage.

  1. The forward proxy is activated using the ProxyRequests directive. Because forward proxies allow clients to access arbitrary sites through your server and to hide their true origin, it is essential that you secure your server so that only authorized clients can access the proxy before activating a forward proxy.

Reverse proxy

  1. A reverse proxy (or gateway), by contrast, appears to the client just like an ordinary web server. No special configuration on the client is necessary.
  2. The client makes ordinary requests for content in the namespace of the reverse proxy. The reverse proxy then decides where to send those requests and returns the content as if it were itself the origin.
  3. A typical usage of a reverse proxy is to provide Internet users access to a server that is behind a firewall.
  4. Reverse proxies can also be used to balance load among several back-end servers or to provide caching for a slower back-end server.
  5. In addition, reverse proxies can be used simply to bring several servers into the same URL space.

Note: A reverse proxy is activated using the ProxyPass directive or the [P] flag to the RewriteRule directive. It is not necessary to turn ProxyRequests on in order to configure a reverse proxy.

Reference link

Reverse Proxy guide click here

Configuring the Proxy

As with any modules, the first thing to do is to load them in httpd.conf (this is not necessary if we build them statically into Apache).

  • LoadModule proxy_module modules/mod_proxy.so
  • LoadModule proxy_http_module modules/mod_proxy_http.so
  • #LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
  • #LoadModule proxy_connect_module modules/mod_proxy_connect.so
  • LoadModule headers_module modules/mod_headers.so
  • LoadModule deflate_module modules/mod_deflate.so
  • LoadFile /usr/lib/libxml2.so
  • LoadModule xml2enc_module modules/mod_xml2enc.so
  • LoadModule proxy_html_module modules/mod_proxy_html.so

For windows users this is slightly different: you'll need to load libxml2.dll rather than libxml2.so, and you'll probably need to load iconv.dll and xlib.dll as prerequisites to libxml2 (you can download them from zlatkovic.com, the same site that maintains windows binaries of libxml2). The LoadFile directive is the same.

Of course, you may not need all the modules. Two that are not required in our typical scenario are shown commented out above.

Having loaded the modules, we can now configure the Proxy. But before doing so, we have an important security warning:

Do Not set "ProxyRequests On". Setting ProxyRequests On turns your server into an Open Proxy. There are 'bots scanning the Web for open proxies. When they find you, they'll start using you to route around blocks and filters to access questionable or illegal material. At worst, they might be able to route email spam through your proxy. Your legitimate traffic will be swamped, and you'll find your server getting blocked by things like family filters.

Of course, you may also want to run a forward proxy with appropriate security measures, but that lies outside the scope of this article. The author runs both forward and reverse proxies on the same server (but under different Virtual Hosts).

The fundamental configuration directive to set up a reverse proxy is ProxyPass. We use it to set up proxy rules for each of the application servers:


ProxyPass       /app1/  http://internal1.example.com/
ProxyPass       /app2/  http://internal2.example.com/

The [P] flag to mod_rewrite offers an alternative to Proxypass, but this is more complex, and may in some instances degrade performance by making it impossible for Apache to use persistent proxy connections.

Now as soon as Apache re-reads the configuration (the recommended way to do this is with "apachectl graceful"), proxy requests will work, so http://www.example.com/app1/some-path maps to http://internal1.example.com/some-path as required.

However, this is not the whole story. ProxyPass just sends traffic straight through. So when the application servers generate references to themselves (or to other internal addresses), they will be passed straight through to the outside world, where they won't work.

For example, an HTTP redirection often takes place when a user (or author) forgets a trailing slash in a URL. So the response to a request for http://www.example.com/app1/foo proxies to http://internal.example.com/foo which generates a response:


        HTTP/1.1 302 Found
        Location: http://internal.example.com/foo/
        (etc)

But from the outside world, the net effect of this is a "No such host" error. The proxy needs to re-map the Location header to its own address space and return a valid URL



        HTTP/1.1 302 Found
        Location: http://www.example.com/app1/foo/

The command to enable such rewrites in the HTTP Headers is ProxyPassReverse. The Apache documentation suggests the form:


ProxyPassReverse /app1/ http://internal1.example.com/
ProxyPassReverse /app2/ http://internal2.example.com/

However, there is a slightly more complex alternative form that I recommend as more robust:



        ProxyPassReverse /


        ProxyPassReverse /

Note: this currently fails due to a regression in mod_proxy. It does the right thing with the ProxyPassReverse balancer:/// form if you have a balancer: this is a workaround. Note too that the three slashes are not a typo! Without a balancer, please apply the patch from the bug report or use the other form.

The reason for recommending this is that a problem arises with some application servers. Suppose for example we have a redirect:

    HTTP/1.1 302 Found
    Location: /some/path/to/file.html

This is a violation of the HTTP protocol and so should never happen: HTTP only permits full URLs in Location headers. However, it is also a source of much confusion, not least because the CGI spec has a similar Location header with different semantics where relative paths are allowed. There are a lot of broken servers out there! In this instance, the first form of ProxyPassReverse will return the incorrect response

    HTTP/1.1 302 Found
    Location: /some/path/to/file.html

which, even allowing for error-correcting browsers, is outside the Proxy's address space and won't work. The second form fixes this to

    HTTP/1.1 302 Found
    Location: /app2/some/path/to/file.html

which is still broken, but will at least work in error-correcting browsers. Most browsers will deal with this.

If your backend server uses cookies, you may also need the ProxyPassReverseCookiePath and ProxyPassReverseCookieDomain directives. These are similar to ProxyPassReverse, but deal with the different form of cookie headers. These require mod_proxy from Apache 2.2 (recommended), or a patched version of 2.0.

Reference link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment