Skip to content

Instantly share code, notes, and snippets.

@andrewgho
Last active August 29, 2015 14:00
Show Gist options
  • Save andrewgho/11237678 to your computer and use it in GitHub Desktop.
Save andrewgho/11237678 to your computer and use it in GitHub Desktop.
Search for a pattern, displaying last modification time

Search for a pattern, displaying last modification time

Problem

Use grep to search for a pattern in a set of files, but in the output, display the last modification time (mtime) of each file along with the matching line.

Solution

Use a shell for loop to go over files one at a time, then decorate each output line with the file's last modification date, as reported by the stat command line tool:

for f in *.txt; do mtime=`stat -c %y "$f" | sed 's/\.00*//'` && grep needle "$f" | (read line && echo "[$mtime] $f:$line"); done

Or, on BSD systems, including OS X:

for f in *.txt; do mtime=`stat -t '%F %T %z' -f %Sm "$f"` && grep needle "$f" | (read line && echo "[$mtime] $f:$line"); done

Description

Assuming we want to find pattern needle in all files with a .txt extension, here is the bare grep command:

grep needle *.txt

This prints out filenames and lines that matched:

haystack1.txt:needle
haystack2.txt:this line contains: the string "needle"

We want the output to include file last modification time. You can get this by using the stat command line tool, setting a custom output format to emit just last modification time.

For GNU stat (i.e., on Linux systems):

stat -c %y example.txt

For BSD stat (i.e., on BSD derived systems, including OS X):

stat -t '%F %T %z' -f %Sm example.txt

The following examples will include both syntaxes, GNU first, BSD second.

To put both last modification time and matching line on the same line of output, we can use the relational join command (the cmd1 <(cmd2) <(cmd3) syntax is a Bash extension that lets you supply two stdin streams to a tool that usually takes filenames):

join <(grep -l needle *.txt | xargs stat -c '%n [%y]') <(grep needle *.txt | sed 's/:/ /') | sed 's/\.00*//; s/\] /\]:/'
join <(grep -l needle *.txt | xargs stat -t '%F %T %z' -f '%N [%Sm]') <(grep needle *.txt | sed 's/:/ /') | sed 's/\] /\]:/'

You can wrap this up into a bash function:

mtime_grep() {
    join <(grep -l needle *.txt | xargs stat -c '%n [%y]') \
         <(grep needle *.txt | sed 's/:/ /') \
        | sed 's/\.00*//; s/\] /\]:/'
}
mtime_grep() {
    join <(grep -l needle *.txt | xargs stat -t '%F %T %z' -f '%N [%Sm]') \
         <(grep needle *.txt | sed 's/:/ /') \
        | sed 's/\] /\]:/'
}

This lets you run:

mtime_grep needle *.txt

However, this solution has several flaws:

  • The grep command is run twice, which may be slow for large searches.
  • The join command will fail for filenames with spaces, colons, or newlines in them.
  • The last modification time is displayed after the filename, so it is difficult to sort the output by last modification time, and the output is not visually lined up in columns.

The simplest way to solve the problem for files whose filenames contain spaces, colons, or newlines is to use the shell's built in file expansion, which handles all of those cases. We pass files to grep one at a time, then prepend the last modification time and filename to each output line:

for f in *.txt; do mtime=`stat -c %y "$f" | sed 's/\.00*//'` && grep needle "$f" | (read line && echo "[$mtime] $f:$line"); done
for f in *.txt; do mtime=`stat -t '%F %T %z' -f %Sm "$f"` && grep needle "$f" | (read line && echo "[$mtime] $f:$line"); done

This latter solution solves all the problems from the former solution, but because it relies on the shell's built-in glob (wildcard) expansion, it cannot be wrapped in a function or a script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment