Skip to content

Instantly share code, notes, and snippets.

View slumbi's full-sized avatar

Akos Bontovics slumbi

View GitHub Profile
@emk
emk / apache-logs-hive.sql
Created January 3, 2012 18:01
Apache log analysis with Hadoop, Hive and HBase
-- This is a Hive program. Hive is an SQL-like language that compiles
-- into Hadoop Map/Reduce jobs. It's very popular among analysts at
-- Facebook, because it allows them to query enormous Hadoop data
-- stores using a language much like SQL.
-- Our logs are stored on the Hadoop Distributed File System, in the
-- directory /logs/randomhacks.net/access. They're ordinary Apache
-- logs in *.gz format.
--
-- We want to pretend that these gzipped log files are a database table,