cgivre/drill-httpd-docs.md

## drill-httpd-docs.md

      
    Raw
  

              drill-httpd-docs.md
            
          
    Reading Web Server Logs

As of version 1.9, Apache Drill can natively ingest and query web server logs.  To configure Drill to read server logs, you must modify the
extensions section in the dfs configuration:
"httpd": {
  "type": "httpd",
  "logFormat": "%h %t \"%r\" %>s %b \"%{Referer}i\" \"%{user-agent}i\"",
  "timestampFormat": null
}

The logFormat section must match the format of your log files, otherwise Drill will not be able to correctly parse your logs.  The table below is a list of the fields which can be included in log files.
The timestampformat is optional, but you can include a format for the time stamp and Drill will parse the times in the log files into Drill dates.
HTTPD Format Strings


Format String
Variable Name


%a
connection.client.ip


%{c}a
connection.client.peerip


%A
connection.server.ip


%B
response.body.bytes


%b
response.body.bytesclf


%{Foobar}C
request.cookies.*


%D
server.process.time


%{Foobar}e
server.environment.*


%f
server.filename


%h
connection.client.host


%H
request.protocol


%{Foobar}i
request.header.


%k
connection.keepalivecount


%l
connection.client.logname


%L
request.errorlogid	STRING


%m
request.method


%{Foobar}n
server.module_note.*


%{Foobar}o
response.header.*


%p
request.server.port.canonical


%{canonical}p
connection.server.port.canonical


%{local}p
connection.server.port


%{remote}p
connection.client.port


%P
connection.server.child.processid


%{pid}P
connection.server.child.processid


%{tid}P
connection.server.child.threadid


%{hextid}P
connection.server.child.hexthreadid


%q
request.querystring


%r
request.firstline


%R
request.handler


%s
request.status.original


%>s
request.status.last


%t
request.receive.time


%{msec}t
request.receive.time.begin.msec


%{begin:msec}t
request.receive.time.begin.msec


%{end:msec}t
request.receive.time.end.msec


%{usec}t
request.receive.time.begin.usec


%{begin:usec}t
request.receive.time.begin.usec


%{end:usec}t
request.receive.time.end.usec


%{msec_frac}t
request.receive.time.begin.msec_frac


%{begin:msec_frac}t
request.receive.time.begin.msec_frac	TIME.EPOCH


%{end:msec_frac}t
request.receive.time.end.msec_frac


%{usec_frac}t
request.receive.time.begin.usec_frac


%{begin:usec_frac}t
request.receive.time.begin.usec_frac


%{end:usec_frac}t
request.receive.time.end.usec_frac


%T
response.server.processing.time


%u
connection.client.user


%U
request.urlpath


%v
connection.server.name.canonical


%V
connection.server.name


%X
response.connection.status


%I
request.bytes


%O
response.bytes


%{cookie}i
request.cookies


%{set-cookie}o
response.cookies


%{user-agent}i
request.user-agent


%{referer}i
request.referer


Additional Functionality

In addition to the ability to read raw log files, there are two functions intended to be used whilst analyzing log files:

parse_url(<url>):  This function accepts a URL as an argument and returns a map of the URL's protocol, authority, host, and path.
parse_query( <query_string> ):  This function accepts a query string and returns a key/value pairing of the variables submitted in the request.

In addition, there is a function available here: https://github.com/cgivre/drill-useragent-function which can parse User Agent strings and return a map of all the pertinent information.
Format String	Variable Name
%a	connection.client.ip
%{c}a	connection.client.peerip
%A	connection.server.ip
%B	response.body.bytes
%b	response.body.bytesclf
%{Foobar}C	request.cookies.*
%D	server.process.time
%{Foobar}e	server.environment.*
%f	server.filename
%h	connection.client.host
%H	request.protocol
%{Foobar}i	request.header.
%k	connection.keepalivecount
%l	connection.client.logname
%L	request.errorlogid STRING
%m	request.method
%{Foobar}n	server.module_note.*
%{Foobar}o	response.header.*
%p	request.server.port.canonical
%{canonical}p	connection.server.port.canonical
%{local}p	connection.server.port
%{remote}p	connection.client.port
%P	connection.server.child.processid
%{pid}P	connection.server.child.processid
%{tid}P	connection.server.child.threadid
%{hextid}P	connection.server.child.hexthreadid
%q	request.querystring
%r	request.firstline
%R	request.handler
%s	request.status.original
%>s	request.status.last
%t	request.receive.time
%{msec}t	request.receive.time.begin.msec
%{begin:msec}t	request.receive.time.begin.msec
%{end:msec}t	request.receive.time.end.msec
%{usec}t	request.receive.time.begin.usec
%{begin:usec}t	request.receive.time.begin.usec
%{end:usec}t	request.receive.time.end.usec
%{msec_frac}t	request.receive.time.begin.msec_frac
%{begin:msec_frac}t	request.receive.time.begin.msec_frac TIME.EPOCH
%{end:msec_frac}t	request.receive.time.end.msec_frac
%{usec_frac}t	request.receive.time.begin.usec_frac
%{begin:usec_frac}t	request.receive.time.begin.usec_frac
%{end:usec_frac}t	request.receive.time.end.usec_frac
%T	response.server.processing.time
%u	connection.client.user
%U	request.urlpath
%v	connection.server.name.canonical
%V	connection.server.name
%X	response.connection.status
%I	request.bytes
%O	response.bytes
%{cookie}i	request.cookies
%{set-cookie}o	response.cookies
%{user-agent}i	request.user-agent
%{referer}i	request.referer