Skip to content

Instantly share code, notes, and snippets.

@sheeplogh
Created September 3, 2012 02:34
Show Gist options
  • Save sheeplogh/3606342 to your computer and use it in GitHub Desktop.
Save sheeplogh/3606342 to your computer and use it in GitHub Desktop.
[urchin] logformat conf for Amazon S3
#-----------------------------------------------------------------------------------------------
# Urchin Logformat Map - Custom Format
#
# Urchin uses this file to determine which fields are contained in the log file.
# The file contains name/value pairs which affect the parsing of the data fields.
# The log file can contain lines with up to two different formats which are
# denoted as Primary and Secondary in the name/value pairs below.
#
# Lines beginning with a '#' are ignored. Fields in this file are separated by
# whitespace (spaces or tabs). Any fields that have whitespace in them must be
# surrounded by quotes. The ID numbers used in the PrimaryPositions and
# SecondaryPositions are defined in fieldlist.txt file. Values entered as "-"
# are considered to be empty. To enter a literal "-" character, escape it with
# a backslash. To enter a literal backslash, escape it with a backslash.
#
# Name: Available Value(s)
# ----- ------------------
# PrimaryPositions AUTO or comma separated list of field IDs from fieldlist.txt
# Use 0 as the field ID for unused fields in your log file.
# PrimaryKey - or character that distinguishes this line from the Secondary format
# PrimaryContent Hit, Item, or Transaction
# SecondaryPositions AUTO, -, or comma separated list of field IDs from fieldlist.txt
# Use 0 as the field ID for unused fields in your log file.
# SecondaryKey - or character that distinguishes this line from the Primary format
# SecondaryContent Hit, Item, Transaction, or -
# CommentKey - or character that signals the line in the log file is a comment line
# FieldSeparator[1,2] - or character used to separate the fields (space = \s, tab = \t)
# QuotesEscapeSep Yes or No Specifies to ignore field separators when inside quotes
# BracketsEscapeSep Yes or No Specifies to ignore field separators when inside brackets
# MergeSuccessiveSep Yes or No Specifies to interpret successive separators as one
# CleanWhiteSpace Yes or No Specifies to remove white space from the ends of the field
# StatusRequired Yes or No Specifies whether hits must have a valid status value
# CustomDateFormat - or format used by strptime (%Y = 4 digit year, %m = month, %d = day)
# CustomTimeFormat - or format used by strptime (%H = 0-24 hour, %M = minutes, %S = seconds)
# TimeZoneOffset 0 or +/-HHMM offset from GMT in which the date/time is recorded. Set to 0
# for timestamps in GMT or for timestamps that contain timezone offsets
#-----------------------------------------------------------------------------------------------
PrimaryPositions: "201,202,3,12,203,204,205,206,6,10,207,11,208,209,210,15,13,211"
SecondaryPositions: -
PrimaryKey: -
SecondaryKey: -
PrimaryContent: HIT
SecondaryContent: -
CommentKey: #
FieldSeparator1: \s
FieldSeparator2: \t
QuotesEscapeSep: YES
BracketsEscapeSep: YES
MergeSuccessiveSep: NO
CleanWhiteSpace: NO
StatusRequired: YES
CustomDateFormat: "%m/%d/%y"
CustomTimeFormat: "%H:%M:%S"
TimeZoneOffset: 0
#-----------------------------------------------------------------------------------------------
# This file can also be used to specify custom filters. These filters are
# specified using the format listed below. The field definitions are as follows:
#
# Field: Definition/Value(s)
# ------ -------------------
# ID ID number of field to store the data in
# TYPE CALC (represents custom calculated field)
# NAME User friendly name for this field specified in ID
# SRC-A ID number of field (1-300)
# EXP-A Regular expression used to capture data from SRC-A
# SRC-B ID number of field (1-300)
# EXP-B Regular expression used to capture data from SRC-B
# CONSTRUCT Format string that specifies which parts to combine from SRC-A and SRC-B. Matched
# pieces of the regular expressions are specified by the format $A1, where A is the
# source field and 1 is the first match. For example, "$A1|$B1" specifies to put the
# first matched part from A together with a '|' character and then the first matched
# part from B.
# REQUIRE A, B, Both, Either, or - Specifies which fields must have data before creating the
# output
# OVERRIDE Yes or No Specifies to overwrite data in the ID field if it already contains data
# CASE Yes or No Specifies whether filters are case sensitive (Default is No).
#
# Custom Calculated Fields (#226-300)
#ID TYPE NAME SRC-A EXP-A SRC-B EXP-B CONSTRUCT REQUIRE OVERRIDE CASE
#-----------------------------------------------------------------------------------------------
#226 CALC custom_calc_field1 10 (.*) 11 (.*) $A1|$B1 BOTH YES NO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment