Skip to content

Instantly share code, notes, and snippets.

@stilist
Created June 30, 2020 21:14
Show Gist options
  • Save stilist/ff5f427d2630760c8f2fad2d75406360 to your computer and use it in GitHub Desktop.
Save stilist/ff5f427d2630760c8f2fad2d75406360 to your computer and use it in GitHub Desktop.
Count date-like strings in files in /var/log
#!/bin/bash
set -euo pipefail
log_directory="${1:-/var/log}"
year="20[[:digit:]]{2}"
day="[[:digit:]]{2}"
month_num="[[:digit:]]{1,2}"
month_alpha="[[:upper:]][[:lower:]]{2}"
separator="[-\/]"
# 2020-06-29
# 2020/06/29
iso_8601="${year}${separator}${month_num}${separator}${day}"
# 30/Jun/2020
american="${day}${separator}${month_alpha}${separator}${year}"
date_pattern="(${iso_8601}|${american})"
# * `grep`: search log files in `LOG_DIRECTORY` for things that look like dates
# (`date_pattern`); print the lines that match
# * `sed`:
# * remove anything after the first date in the line
# * remove anything before the first date in the line
# * replace slashes with dashes so `date` can parse dates that have a month's
# name instead of a number (`30/Jun/2020` => `30-Jun-2020`)
# * `date`: parse each line of standard input as a date and print it in
# `YYYY-MM-DD` format
# * `sort`: if lines aren't sorted `uniq` treats separated chunks of duplicate
# text as separate things
# * `uniq`: count how many times each date appears
# * `sort`: sort by count
grep \
--color=never \
--dereference-recursive \
--extended-regexp \
--no-filename \
"${date_pattern}" \
"${log_directory}" \
2>/dev/null \
| sed \
--quiet \
--regexp-extended \
-e "s/(${date_pattern}).*$/\1/p" \
| sed \
--quiet \
--regexp-extended \
-e "s/^.*(${date_pattern})/\1/p" \
| sed 's:\/:-:g' \
| date --file=- +'%F' \
| sort \
| uniq --count \
| sort --key=1 \
--numeric \
--reverse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment