Skip to content

Instantly share code, notes, and snippets.

View mattbornski's full-sized avatar

Matt Bornski mattbornski

View GitHub Profile
@mattbornski
mattbornski / apache-logs-hive.sql
Created November 13, 2012 00:08 — forked from emk/apache-logs-hive.sql
Apache log analysis with Hadoop, Hive and HBase
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.
-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access. They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,