Skip to content

Instantly share code, notes, and snippets.

@stevemohapibanks
Created March 15, 2011 09:39
Show Gist options
  • Save stevemohapibanks/870504 to your computer and use it in GitHub Desktop.
Save stevemohapibanks/870504 to your computer and use it in GitHub Desktop.
Strips out and reformats an NPG access log file
# First reformat in to CSV and remove duff fields
cat npgj2ee1.nature.com_nature_access.log.2011-03-11 | awk -F"( \"| -)" '{ if ($1 ~ /^Timestamp /) {next;}; printf "%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s\n", $1, $2, $4, $5, $6, $7, $9, $10, $11, $12, $13, $14 }' | sed -e 's/"//g' > sample.log
# Now process sensitive fields
require 'csv'
require 'digest/md5'
output_string = CSV.generate do |output|
CSV.foreach(ARGV.first) do |row|
output << row.map do |field|
if !field.nil? && field.strip =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/
Digest::MD5.hexdigest(field)
else
field
end
end
end
end
STDOUT.write output_string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment