Skip to content

Instantly share code, notes, and snippets.

@holybit
Last active August 29, 2015 14:11
Show Gist options
  • Save holybit/a387a88ae4f131c4f327 to your computer and use it in GitHub Desktop.
Save holybit/a387a88ae4f131c4f327 to your computer and use it in GitHub Desktop.
Log Layout Design Questions

Current Apache Log Format

Old school key=value pairs. Apache httpd.conf snippet follows:

LogFormat "site=%{site_name}e ip=%h datetime="%{%F %H:%M:%S %z}t" timestamp=%{%s}t host=%V request="%r" status=%>s response_size=%b response_time=%>D referer="%{Referer}i" user_agent="%{User-Agent}i" filename=%f session_id=%{rp_session_id}n tracking_id=%{RPID}C user_id=%{user_id}n realm_id=%{realm_id}n superuser_id=%{superuser_id}n" custom_log

Log Storage

Logs are currently parsed by a Perl script and then stored in Hadoop Hive for a number of business critical use cases.

Going Forward

We want to start using ELK soon. But we can not store the logs in Elasticsearch for another month or so as the ES cluster is not yet ready.

The big caveat is that logs must continue to load to Hadoop Hive.

Questions

Should we change the Apache output to JSON or leave it as is (i.e., key=value)? If we feed JSON logs into Logstash then we'll have to both store them on disc for eventual insertion into Elasticsearch but at the same time pivot the data back to key=value pairs and emit to a file that can be loaded to Hadoop Hive.

What are folks general thoughts on using JSON for this scenario or should we just leave it on the old key=value format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment