Skip to content

Instantly share code, notes, and snippets.

@vmadman
Created April 27, 2013 06:59
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save vmadman/5472166 to your computer and use it in GitHub Desktop.
Save vmadman/5472166 to your computer and use it in GitHub Desktop.
An apache log format that allow access logs (but not error logs) to be output in JSON format. I found this here: http://untergeek.com/2012/10/11/getting-apache-to-output-json-for-logstash/ -- but modified it for my purposes a good bit.
# Access Logs
LogFormat "{ \
\"@vips\":[\"%v\"], \
\"@source\":\"%v%U%q\", \
\"@source_host\": \"%v\", \
\"@source_path\": \"%f\", \
\"@tags\":[\"Apache\",\"Access\"], \
\"@message\": \"%h %l %u %t \\\"%r\\\" %>s %b\", \
\"@fields\": { \
\"timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \
\"clientip\": \"%a\", \
\"duration\": %D, \
\"status\": %>s, \
\"request\": \"%U%q\", \
\"urlpath\": \"%U\", \
\"urlquery\": \"%q\", \
\"method\": \"%m\", \
\"referer\": \"%{Referer}i\", \
\"user-agent\": \"%{User-agent}i\", \
\"bytes\": %B \
} \
}" ls_apache_json
# The catch-all
CustomLog "||/usr/local/bin/udpclient.pl 127.0.0.1 5001" ls_apache_json
@cameronkerrnz
Copy link

You should note that you can end up with sequences such as \xXX (four characters) and \v (two characters) being emitted by Apache httpd, which will invalidate your JSON and cause logstash not to process that log entry (useful if you're an attacker and don't want a request to be logged). See http://blog.client9.com/2013/08/19/json-log-format-for-apache.html for details.

But your use of a piped log gave me an idea that I could use something like the following:

CustomLog "|/bin/sed -e s/\\v/\\u0013/g -e s/\\x/\\u00/ >> /var/log/httpd/access_log"

So thanks for the inspiration.

FWIW, Here's what I'm logging, which shows some other useful bits and pieces, including logging cookies, redirect location,various request headers, and content-type.

# Note also that httpd will escape " to \", plus various others... (see the docs),
# which (almost) matches up with JSON's requirements, with the exception of
# \xXX (should be \x00XX in JSON) and \v (should be \u0013 in JSON)
#
# THINGS TO NOTE/CHECK/ADD/REMOVE:
#    Any session cookies are good to log 
#       Example: JSESSION (change/remove as required)
#    Any particular HTTP request headers (particularly for servers behind a reverse proxy)
#       The REMOTE_USER example shows this (for things protected by a type of web SSO)
#
LogFormat "{ \
 \"@timestamp\":\"%{%FT%T%z}t\", \
 \"client_ip\":\"%a\", \
 \"client_port\":\"%{remote}p\", \
 \"server_ip\":\"%A\", \
 \"X-Forwarded-For\":\"%{X-Forwarded-For}i\", \
 \"user\":\"%u\", \
 \"REMOTE_USER\":\"%{REMOTE_USER}i\", \
 \"JSESSIONID\":\"%{JSESSIONID}C\", \
 \"pid\":\"%p\", \
 \"protocol\":\"%H\", \
 \"http_method\":\"%m\", \
 \"vhost\":\"%{Host}i\", \
 \"service_port\":\"%p\", \
 \"path\":\"%U\", \
 \"query_string\":\"%q\", \
 \"referer\":\"%{Referer}i\", \
 \"user_agent\":\"%{User-agent}i\", \
 \"response_code\":\"%>s\", \
 \"response_location\":\"%{Location}o\", \
 \"Content-Type\":\"%{Content-Type}o\", \
 \"bytes_in\":\"%I\", \
 \"bytes_out\":\"%O\", \
 \"keepalive\":\"%X\", \
 \"duration_micros\":\"%D\", \
 }" logstash_json
 
CustomLog logs/access_log.logstash_json logstash_json

If you're duplicating logs (eg. you have some other log shipper that is tailing the logs (and playing catchup if the log server was unavailable for a while), then you may like to more frequently rotate that log file; if necessary more frequently than once per day. (its helpful to realise that files under /etc/logrotate.d/ can be seen as standalone logrotate.conf files, so you might have a cron job that runs /usr/bin/logrotate --conf /etc/logrotate.d/logstash_json.conf every few hours or so... but then, if you're doing a significant volume, perharps you should be using the rotatelogs command that comes with Apache (see the documentation for 'Piped Logs')

Thanks for sharing!
Cameron

@jonjensen
Copy link

Clever postprocessing of the non-JSON \x and \v escapes!

However, I think this one is wrong:

s/\v/\u0013/g

because \v is 13 in octal, but b in hex, so it should be:

s/\v/\u000b/g

Also beware that this will wrongly convert already escaped things like literal \x which should stay as is, as opposed to \x.

@cameronkerrnz
Copy link

Quite right regarding \v

For \x, I think the right thing to do would be to read in strings of contiguous \xXX\xYY..., unescape them to raw bytes, and attempt to treat them as UTF-8. If that works, then output them as Unicode characters (encoded in UTF-8), and if not, then output each byte in \u00xx form.

Literal \x should not appear without already being escaped to \x,, but yes, care is needed.

In my deployment, I'm now using filebeat to send these logs to a logstash instance, so I'm toying with the idea of creating a filter plugin (I'm imagining calling this logstash-filter-reescape-c-to-json) which should do a good job of this, ready for actual JSON parsing.

The most common requirement for this is for when path names legitimately contain characters outside the US-ASCII range. Notably Chinese. It will help to clean up the error signal too, which could be useful later as a signal of attack.

Speaking of which, if any of the unescaped sequences mention dangerous characters such as \0 then I could set a tag, such as dangerous_escape_encountered

@alexjurkiewicz
Copy link

alexjurkiewicz commented Nov 15, 2018

Small correction, the sed customlog command needs to be spawned with a shell (|$ vs |) for the >> redirect to have meaning.

Corrected line (with \v fix as well):

CustomLog "|$/bin/sed -e s/\\v/\\ u000b/g -e s/\\x/\\u00/ >> /var/log/httpd/access_log" myformat

Doc reference: https://httpd.apache.org/docs/2.4/logs.html#piped

@alexjurkiewicz
Copy link

Actually, even simpler is to use the new feature of Filebeat that handles JSON decode errors automatically:

  - type: log
    enabled: true
    paths:
      - /var/log/mylog.json
    json:
      keys_under_root: true
      add_error_key: true

If parsing fails, you'll get an entry with fields (among others):

{
  "error.message": "Error decoding JSON: invalid character 'o' in literal false (expecting 'a')",
  "error.type": "json",
  "message": "fooooo"
}

Then you can set up CustomLog to write directly to a file, bypassing sed.

@pacohope
Copy link

pacohope commented Jun 7, 2019

Small correction, the sed customlog command needs to be spawned with a shell (|$ vs |) for the >> redirect to have meaning.

Seems to me that parts of the access_log contain user input from untrusted sources on the Internet. Is it really safe to pipe that through sed on the web server? The sed command will be running as the web server user and this might create opportunities for command injection. Likewise that's like one sed invocation per web request. Surely this will perform badly on a highly loaded server. Apache goes to great lengths to be scalable, but if we invoke a whole unix process on each and every log line, I think that would significantly hurt scalability.

@jonjensen
Copy link

Likewise that's like one sed invocation per web request. Surely this will perform badly on a highly loaded server.

@pacohope Luckily Apache 2.4 (and < 2.4 when using the || form) starts the filtering process once at Apache startup time, and pushes data through that one always-running process, so no, it doesn't add much overhead assuming the filter program is efficient, and it doesn't respawn anew for each request:

https://httpd.apache.org/docs/2.4/logs.html#piped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment