Skip to content

Instantly share code, notes, and snippets.

@cgivre
Last active August 18, 2017 11:17
Show Gist options
  • Save cgivre/47f07a06d44df2af625fc6848407ae7c to your computer and use it in GitHub Desktop.
Save cgivre/47f07a06d44df2af625fc6848407ae7c to your computer and use it in GitHub Desktop.
How to Read Web Server Logs with Apache Drill

Reading Web Server Logs

As of version 1.9, Apache Drill can natively ingest and query web server logs. To configure Drill to read server logs, you must modify the extensions section in the dfs configuration:

"httpd": {
  "type": "httpd",
  "logFormat": "%h %t \"%r\" %>s %b \"%{Referer}i\" \"%{user-agent}i\"",
  "timestampFormat": null
}

The logFormat section must match the format of your log files, otherwise Drill will not be able to correctly parse your logs. The table below is a list of the fields which can be included in log files. The timestampformat is optional, but you can include a format for the time stamp and Drill will parse the times in the log files into Drill dates.

HTTPD Format Strings

Format String Variable Name
%a connection.client.ip
%{c}a connection.client.peerip
%A connection.server.ip
%B response.body.bytes
%b response.body.bytesclf
%{Foobar}C request.cookies.*
%D server.process.time
%{Foobar}e server.environment.*
%f server.filename
%h connection.client.host
%H request.protocol
%{Foobar}i request.header.
%k connection.keepalivecount
%l connection.client.logname
%L request.errorlogid STRING
%m request.method
%{Foobar}n server.module_note.*
%{Foobar}o response.header.*
%p request.server.port.canonical
%{canonical}p connection.server.port.canonical
%{local}p connection.server.port
%{remote}p connection.client.port
%P connection.server.child.processid
%{pid}P connection.server.child.processid
%{tid}P connection.server.child.threadid
%{hextid}P connection.server.child.hexthreadid
%q request.querystring
%r request.firstline
%R request.handler
%s request.status.original
%>s request.status.last
%t request.receive.time
%{msec}t request.receive.time.begin.msec
%{begin:msec}t request.receive.time.begin.msec
%{end:msec}t request.receive.time.end.msec
%{usec}t request.receive.time.begin.usec
%{begin:usec}t request.receive.time.begin.usec
%{end:usec}t request.receive.time.end.usec
%{msec_frac}t request.receive.time.begin.msec_frac
%{begin:msec_frac}t request.receive.time.begin.msec_frac TIME.EPOCH
%{end:msec_frac}t request.receive.time.end.msec_frac
%{usec_frac}t request.receive.time.begin.usec_frac
%{begin:usec_frac}t request.receive.time.begin.usec_frac
%{end:usec_frac}t request.receive.time.end.usec_frac
%T response.server.processing.time
%u connection.client.user
%U request.urlpath
%v connection.server.name.canonical
%V connection.server.name
%X response.connection.status
%I request.bytes
%O response.bytes
%{cookie}i request.cookies
%{set-cookie}o response.cookies
%{user-agent}i request.user-agent
%{referer}i request.referer

Additional Functionality

In addition to the ability to read raw log files, there are two functions intended to be used whilst analyzing log files:

  • parse_url(<url>): This function accepts a URL as an argument and returns a map of the URL's protocol, authority, host, and path.
  • parse_query( <query_string> ): This function accepts a query string and returns a key/value pairing of the variables submitted in the request.

In addition, there is a function available here: https://github.com/cgivre/drill-useragent-function which can parse User Agent strings and return a map of all the pertinent information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment