Skip to content

Instantly share code, notes, and snippets.

@simonw
Last active January 22, 2019 18:41
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save simonw/1e0fdf2e9b8744b39bd7 to your computer and use it in GitHub Desktop.
Save simonw/1e0fdf2e9b8744b39bd7 to your computer and use it in GitHub Desktop.
How to get Heroku to log full user_agent strings using nginx

How to get Heroku to log full user_agent strings using nginx

Heroku's default logging format omits user agent and referrer strings - but these are useful to have! Especially if you want to be able to run your own analysis on what kind of browsers are using your service.

A default Heroku log line looks like this:

heroku/router:  at=info method=GET path="/page-on-your-site/"
    host=your-site.herokuapp.com
    request_id=76fe0891-38cd-4293-a082-ac075d744197
    fwd="10.10.10.10" dyno=web.1 connect=1ms service=961ms
    status=200 bytes=1274

It isn't possible to change the format output by the Heroku router. But it IS possible to write your own lines to the heroku logs. You can do this from your own application, but it turns out there are some solid reasons to run your own nginx server inside your Heroku dyno (for buffering large requests). If you're doing that already, it's easy to configure that nginx instance to use whatever custom logging format you like.

If you use the Heroku nginx buildpack (or one of its forks), the buildpack will install a nginx configuration file in the following location:

config/nginx.conf.erb

This happens before your app is copied from your git checkout, which means you can replace the nginx config file with your own just by storing your own file in the same location within your application's repository.

This means you can customise the log_format line from that file. Here's the format I'm using, designed to use the semi-standard key="optionally quoted value" format used by other parts of the Heroku stack:

log_format custom 'measure#nginx.service=$request_time request="$request" '
    'status_code=$status request_id=$http_x_request_id '
    'remote_addr="$remote_addr" forwarded_for="$http_x_forwarded_for" '
    'forwarded_proto="$http_x_forwarded_proto" via="$http_via" '
    'body_bytes_sent=$body_bytes_sent referer="$http_referer" '
    'user_agent="$http_user_agent"';
access_log logs/nginx/access.log custom;

Just droy in a copy of the https://github.com/ryandotsmith/nginx-buildpack/blob/master/config/nginx.conf.erb file and make the above alteration to get completely custom logging in your Heroku logs, which looks something like thys:

Feb 05 13:31:55 your-site app/web.1:  measure#nginx.service=0.023
request="GET /blah/ HTTP/1.1" status_code=200 request_id=8ff12bfba-2136
-472b-8b73-e8b5bee4296d remote_addr="10.10.10.10" forwarded_for="1.1.1.1"
forwarded_proto="https" via="1.1 vegur" body_bytes_sent=919 referer="-"
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"

What about escaping special characters?

You might be questioning how reliable this key="quoted value" format style of logging actually is. What if a double quote shows up in the content of the value, for example in the user-agent string? Would that break a log parser that expects to be able to read these key value pairs?

It turns out nginx has you covered: it automatically escapes double quotes (and some other characters) when it interpolates values in the log_format string. Here's an example of a request I generated using a faked user-agent:

requests.get('http://example.com/test/', headers={
    'User-Agent': 'This user-agent has "double quotes" in it'
})

The resulting nginx logline looked like this:

Feb 05 13:25:22 my-heroku-app app/web.1:  measure#nginx.service=0.009
request="GET /test/ HTTP/1.1" status_code=404 request_id=8433b02c-307c-
4d07-b2c6-8d5b37c3e82b remote_addr="10.0.0.0" forwarded_for="1.1.1.1"
forwarded_proto="http" via="1.1 vegur" body_bytes_sent=90 referer="-"
user_agent="This user-agent has \x22double quotes\x22 in it"

The double quotes were escaped as \x22. I also tested special characters in the path and they get the same treatment.

@simonw
Copy link
Author

simonw commented May 2, 2016

Here's a regular expression for parsing data in this key=optionally-quoted-value format:

import re
pairs_re = re.compile('\b([^\s=]+)=("[^"]*"|[^\s]+)')
pairs = dict(pairs_re.findall(logline))

Regular expression visualization

Example output:

{'body_bytes_sent': '6885',
 'forwarded_for': '"74.112.126.60"',
 'forwarded_proto': '"http"',
 'measure#nginx.service': '0.599',
 'referer': '"http://www.example.com/blah/echmg/"',
 'remote_addr': '"11.222.11.211"',
 'request': '"POST /blah/echmg/ HTTP/1.1"',
 'request_id': 'b38c3b48-43f4-4374-ac79-d353a5123bf0',
 'status_code': '200',
 'user_agent': '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"',
 'via': '"1.1 vegur"'}

Note that some of the values will still have their quotes on them, so code should rstrip/lstrip double quotes from values before continuing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment