Skip to content

Instantly share code, notes, and snippets.

@sjl
Created September 19, 2012 21:52
Show Gist options
  • Save sjl/3752546 to your computer and use it in GitHub Desktop.
Save sjl/3752546 to your computer and use it in GitHub Desktop.
friendly-find

friendly-find

Brainstorming a friendlier find(1).

Usage

Goals:

  • Work for 80-90% of use cases.
  • DTRT most of the time.
  • Don't punish the users.

Basic Usage

ffind [opts] [pattern]

-r             directory to root the search in (default: .)
-f             follow symlinked directories and search their contents too
-d             maximum depth to search
-0             separate matches with a null byte (useful for xargs -0)
-l             force a literal pattern, even if it looks like a regex
-i             case-insensitive matching
-v             invert match
-b             exclude binary files

-a             don't ignore anything (vcs dirs, ignored files, etc can all match)
-u             unrestricted search (vcs dirs are skipped, but ignore files aren't parsed)
-q             semi-restricted mode (vcs dirs and ignore files skipped, but .ffignore used)

-s             restrict possible matches to paths given on stdin

--after time   match things modified after the given time, exclusive
--before time  match things modified before the given time, exclusive
--since time   match things modified after the given time, inclusive
--until time   match things modified before the given time, inclusive
--at time      match things modified at the given time (arbitrarily specific)

Times

There's a lot of logic to times, but they're pretty intuitive to use in practice:

ffind foo --before today
ffind foo --after 2010-09

ffind foo --since yesterday
ffind foo --until 2010-09-05

ffind foo --at 8pm
ffind foo --at 8:00pm

ffind foo --since 10m
ffind foo --before 1h

ffind foo --since 2012 --before 2012-04

Specifying Times

Times can be given in an "absolute" form, or an "ago" form:

2012-09-05   that date, inclusive

2012-09-05 14:55  that date and time
2012-09-05 8 pm   that date and time

8 pm   the last 8 PM that happened 
14:55  the last 2:55 PM that happened

1h     one hour ago
1h30m  one hour, thirty minutes ago
1d     one day (24 hours) ago

Bare times like 8 pm always refer to the last time they happened. It may be today if you're running it later in the day, or yesterday if it's in the morning.

There are also some special forms:

yesterday
today

These are exactly equivalent to specifying the year-month-day form.

Restricting Dates by Ranges

There are two types of date range filters, inclusive and exclusive. For example:

                    2009-------2010-------2011-------2012------...

--since 2010                   [..................................>
--after 2010                              [.......................>

--until 2010       <.....................]
--before 2010      <..........]

--since 2010 --before 2012     [....................]

                    2009-------2010-------2011-------2012------...

The precision of dates depend on how precise you specify them. For example:

                    10----11----12----1-----
--since '11 am'           [.................>
--since '11:30 am'            [.............>
--after '11 am'                 [...........>
--after '11:05 am'         [................>
                    10----11----12----1-----

"Ago" times resove to one specific millisecond. For example:

  • '1m' and '1m0s0ms' are equivalent.
  • '1h' and '1h0m0s0ms' are equivalent.
  • '1d12h' and '1d12h0m0s0ms' are equivalent.
  • '2012-09-05 11 am' and '2012-09-05 11:00:00.000' are equivalent.

Examples:

Current time: 1:30

                    10----11----12----1-----
--since '1h'                       [........>
--since '30m'                         [.....>
--after '1h'                       [........>
--before '2h15m'    <.......]
                    10----11----12----1-----

Restricting with "at"

When used with --at the scope is determined by how you give the time:

--at '2010'
matches things modified in the year 2010

--at '2012-09'
matches things modified in September 2012

--at '1 pm'
matches things modified between 13:00:00 and 13:59:59

--at '1:00 pm'
matches things modified between 13:00:00 and 13:00:59

--at '1:00:00 pm'
matches things modified between 13:00:00 and 13:00:00 999ms

--at '0h'  (performed at 8:55 AM)
matches things modified between 8:00 and 8:59:59
roughly equivalent to --after 55m in this case
"this hour"

--at '1h'  (performed at 8:55 AM)
matches things modified between 7:00 and 7:59:59
"last hour"

--at '1h30m'  (performed at 8:55 AM)
matches things modified between 7:25:00 and 7:25:59

--at '5m'  (performed at 8:55 AM)
matches things modified between 8:52:00 and 8:52:59

Types

Type options:

-A match all types of things (default)
-F match all types of files (real and symlinked)
-D match all types of dirs (real and symlinked)

-E match real (non-symlinked) files
-C match real (non-symlinked) dirs

-X match symlinked files
-Y match symlinked dirs

-R match real (non-symlinked) things
-S match symlinked things

Options can be combined, and will be unioned:

-R   == -EC
-RS  == -A
-EXC == -FC
-FS  == -EXY

Examples:

-D  match any kind of directory
-R  match all real things
-FC match any kind of file, and real directories

The type hierarchy:

                               all things (A)
                                /          \
                               /            \
                        files (F)    directories (D)
            ___________/   |                |    \________
           /               |                |             \
real files (E)  symlinks to files (X)    real dirs (C)  symlinks to dirs (Y)
           \                       \     /                        /
            \                    ---\----                        /
             \                  /    \                          /
              ----real things (R)     symlinked things (S)------
                               \            /
                                \          /
                               all things (A)

Ignoring

Ignored by default:

.hg/**
.git/**
.svn/**

Ignorefiles parsed by default:

.ffignore
.hgignore
.gitignore

Ignore options:

        +------------------------------------------------------------+
        | parse .ffignore | parse VCS ignore files | ignore VCS dirs |
        +-----------------+------------------------+-----------------+
default | yes             | yes                    | yes             |
-q      | yes             |                        | yes             |
-u      |                 |                        | yes             |
-a      |                 |                        |                 |
        +------------------------------------------------------------+

friendly-find versus find

Why Use friendly-find?

  • Ignores VCS directories for you.
  • Ignores stuff in .gitignore, .hgignore, etc for you.
  • Faster when running in a VCS-managed tree because it can ignore a lot.
  • Saner command line syntax (normal options instead of a crazy expression language).
  • Designed to do the right thing, usability-wise, in most cases.

Why Use find?

  • Faster when running in a non-VCS-managed tree (because ffind's ignoring doesn't help it).
  • More powerful expression language, when you need to do something crazy.
@sjl
Copy link
Author

sjl commented Sep 19, 2012

Maybe a smart-case option should be in there somewhere.

@sjl
Copy link
Author

sjl commented Sep 19, 2012

Also probably needs a way to specific create times. Maybe just --created-* versions of the modified opts?

@sjl
Copy link
Author

sjl commented Sep 19, 2012

Size filtering... ideas:

--larger-than 1mb
--smaller-than 1.5gb
--size '<1k'
--size '>=10kb'

@sjl
Copy link
Author

sjl commented Sep 19, 2012

Maybe this type chart is easier to understand:

type hierarchy

@sjl
Copy link
Author

sjl commented Sep 19, 2012

And this time with the right options:

type hierarchy diagram

@mattboehm
Copy link

I was trying to brainstorm type flags where the 4 base types had memorable letters, and the 4 combinations also were easy to remember, but this may be more confusing than heplful.

Example:

-F: real File
-L: Linked file
-FL: FiLes
-D: real Dir
-I: Indirect link to dir
-DI: DIrs
-FD: real File/Dir
-LI: LInks
-A: All

@mattboehm
Copy link

I guess one downside is that -F and -D are the more "obvious" flags and people may expect them to include symlinks by default.

@mattboehm
Copy link

Maybe an --ignore <path/pattern> option to exclude certain subdirs or paths matching the pattern from the list? in theory it could be part of the pattern, but I find myself using ack's --ignore-dir flag quite a bit.

@petdance
Copy link

I like the idea. ffind is an excellent way to make a new, longer name that is just as quick to type.

This feels like you're trying to make an improvement to find like ack is an improvement to grep, with minimal confusion between the two. When I was adding ack features, I ran grep --help and looked for what I wanted to steal. There's a lot of find functionality that you'll have to keep for compatibility, like -print0 and so on.

@kkuchta
Copy link

kkuchta commented Sep 20, 2012

-l force a literal pattern, even if it looks like a regex

Does that imply you're going to have it guess whether [pattern] is a regex or literal by default?

@sjl
Copy link
Author

sjl commented Sep 20, 2012

@petdance yep, that's the goal. friendly-find : find :: ack : grep, and then eventually fast-friendly-find : friendly-find :: ag : ack.

I'm trying to strike a balance between being simple and easy to use, and trying to have enough functionality to allow find users to still have enough power.

@kkuchta yep. literal strings are way faster to match, so if possible it's ideal to use them

@dmedvinsky
Copy link

Couple of thoughts:

  • It would be great if smart case was the default, i.e. if the pattern is all lowercase then -i is implied; if there is uppercased symbol somewhere in the pattern, ignore case is disabled unless explicitly stated.
  • One of the most often use case of find for me is find -name '*something*'. Second one is find -name '*.pyc'. What I'm saying is globbing support would be good to have so I don't have to type .* instead of * in those.
  • As a side thought, maybe ffind should default to substring matching?

@sjl
Copy link
Author

sjl commented Sep 22, 2012

How should we handle paths? For example:

$ cd /foo/bar
$ ls
baz
$ ls baz
spam eggs
$ ffind 'spam'
./baz/spam

Now, does the pattern include the full path, or does it just try to match
filenames? I.e. should this work?

$ ffind 'b.*spam'
./baz/spam

If so, should it match again the absolute path, or just the relative path from
wherever you happen to be rooted?

$ ffind 'f.*b.*spam'
./baz/spam

matched because of:

(/foo/bar)./baz/spam

I was planning on having outputting absolute paths be an option if that changes
things:

$ ffind --print-abs-paths 'spam'
/foo/bar/baz/spam

@teoljungberg
Copy link

$ ffind 'spam'
./baz/spam

I interperet that as if 'spam' exists inside of the a directory 'baz', it's a bit confusing. I don't have any better suggestions tho..

  • --print-abs-path is to long, i.e --abs-path is better
  • SmartCase should definatly be default!

@tikitu
Copy link

tikitu commented Sep 22, 2012

My immediate reaction to the (/foo/bar)./baz/spam is "wtf?" -- I think it will be a gotcha much more often than it will be desired behaviour. You're walking dirs rooted at . so least-surprise matching should be rooted at . also.

@sjl
Copy link
Author

sjl commented Sep 22, 2012

@tikitu yeah I think I agree. That just leaves the question of whether to search basenames or full paths (relative to the root). find's -name only does basenames, so

$ find . -name 'spam'
./baz/spam
$ find . -name 'baz*spam'
$

My first thought it to do that as well, for less surprise. Maybe it's worth making an option (--match-full) though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment