Alfred Aho, Peter Weinberger, and Brian Kernighan - Awk is a utility that enables a programmer to write tiny but effective programs in the form of statements that define text patterns that are to be searched for in each line of a document and the action that is to be taken when a match is found within a line.
AWK Operations:
- Scans a file line by line
- Splits each input line into fields
- Compares input line/fields to pattern
- Performs action(s) on matched lines
BEGIN{ BEGIN_ACTIONS; }
/regex1/ { ACTIONS; }
/regex2/ { ACTIONS; }
END{ END_ACTIONS; }
The first one is BEGIN, which matches only before any line has been input to the file. This is basically where you can initiate variables and all other kinds of state in your script.
There is also END, which as you may have guessed, will match after the whole input has been handled. This lets you clean up or do some final output before exiting.
Example:
ps aux | awk 'BEGIN{total=0} /Chrome/ {total+=$6} END {print total/1024}'
Actions could something like:
exit
: Ends the programnext
: Skips to the next line of inputprint
: Print to standard output
Fields:
$0
: The entire line$1
,$2
, ... : First field, Second field, ...$NF
: Last field$(NF-1)
: Second last field
Environment Variables | Description |
---|---|
FS | Field separator, regular expression used to separate fields |
RS | Record Separator (lines) |
NR | Ordinal number of the current record |
NF | Number of fields in the current record |
OFS | Output field separator |
ORS | Output Record Separator (lines) |
FNR | Number of the current record in the current file |
Functions | Description |
---|---|
length([string]) | The length of its argument taken as a string, if no argument then returns length of $0 |
index(s, t) | the position in s where the string t occurs, or 0 if it does not. |
match(s, r) | the position in s where the regular expression r occurs, or 0 if it does not. |
sub(r, t, s) | substitutes t for the first occurrence of the regular expression r in the string s. |
gsub(r, t, s) | same as sub except that all occurrences of the regular expression are replaced. |
tolower(str) | Returns string in lower characters |
toupper(str) | Returns string in uppercase characters |
split(str, arr [, fsep ]) | Splits the str using fsep and puts it into arr |
gsub example:
awk '{ gsub(/\xef\xbb\xbf/,""); print }' sentences.txt > clean_sentences.txt
ps aux | awk '{if(match($12, /Chrome/)){print $2": Chrome Process"}else{print $2": Non-chrome Process"}}'
ps aux | awk 'BEGIN{total=0} /Chrome/ {for(i=1; i<=2; i++){print $2; total+=$6}} END {print total/1024}'
var=Chrome
ps aux | awk -v a="$var" '{if(match($12, a)){print $2": "a}else{print $2": Non "a}}'
awk 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2
cat file.csv | awk -vFPAT='([^,]*)|("[^"]+")' '{print NF}' | sort | uniq -c