Skip to content

Instantly share code, notes, and snippets.

@jappy
Created March 10, 2012 18:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jappy/2012357 to your computer and use it in GitHub Desktop.
Save jappy/2012357 to your computer and use it in GitHub Desktop.
unix command to extract words from a file (Mac/Linux)
tr -sc 'A-Za-z' '\n' < filename.txt
@jappy
Copy link
Author

jappy commented Mar 10, 2012

From the tr man page (http://unixhelp.ed.ac.uk/CGI/man-cgi?tr+1):

-C      Complement the set of characters in string1, that is ``-C ab''
          includes every character except for `a' and `b'.

-c      Same as -C but complement the set of values in string1.

-s      Squeeze multiple occurrences of the characters listed in the last
         operand (either string1 or string2) in the input into a single
         instance of the character.  This occurs after all deletion and
         translation is completed.

@jappy
Copy link
Author

jappy commented Mar 10, 2012

This command takes the complement of [A-Za-z] which is all non-alphabetic characters and converts them into newlines '\n' (-c parameter). It also squeezes all successive newlines into a single newline (-s parameter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment