wael34218/AWK.md

## AWK.md

      
    Raw
  

              AWK.md
            
          
    AWK Basics

Introduction

Alfred Aho, Peter Weinberger, and Brian Kernighan - Awk is a utility that enables a programmer to write tiny but effective programs in the form of statements that define text patterns that are to be searched for in each line of a document and the action that is to be taken when a match is found within a line.
AWK Operations:

Scans a file line by line
Splits each input line into fields
Compares input line/fields to pattern
Performs action(s) on matched lines

AWK Program Structure

BEGIN{ BEGIN_ACTIONS; }
/regex1/ { ACTIONS; }
/regex2/ { ACTIONS; }
END{ END_ACTIONS; }

The first one is BEGIN, which matches only before any line has been input to the file. This is basically where you can initiate variables and all other kinds of state in your script.
There is also END, which as you may have guessed, will match after the whole input has been handled. This lets you clean up or do some final output before exiting.
Example:
ps aux | awk 'BEGIN{total=0} /Chrome/ {total+=$6} END {print total/1024}'

Actions and Fields

Actions could something like:

exit: Ends the program
next: Skips to the next line of input
print: Print to standard output

Fields:

$0: The entire line
$1, $2, ... : First field, Second field, ...
$NF: Last field
$(NF-1): Second last field

Environment Variables


Environment Variables
Description


FS
Field separator, regular expression used to separate fields


RS
Record Separator (lines)


NR
Ordinal number of the current record


NF
Number of fields in the current record


OFS
Output field separator


ORS
Output Record Separator (lines)


FNR
Number of the current record in the current file


Functions


Functions
Description


length([string])
The length of its argument taken as a string, if no argument then returns length of $0


index(s, t)
the position in s where the string t occurs, or 0 if it does not.


match(s, r)
the position in s where the regular expression r occurs, or 0 if it does not.


sub(r, t, s)
substitutes t for the first occurrence of the regular expression r in the string s.


gsub(r, t, s)
same as sub except that all occurrences of the regular expression are replaced.


tolower(str)
Returns string in lower characters


toupper(str)
Returns string in uppercase characters


split(str, arr [, fsep ])
Splits the str using fsep and puts it into arr


gsub example:
awk '{ gsub(/\xef\xbb\xbf/,""); print }' sentences.txt > clean_sentences.txt

If Condition

ps aux | awk '{if(match($12, /Chrome/)){print $2": Chrome Process"}else{print $2": Non-chrome Process"}}'

For Loop

ps aux | awk 'BEGIN{total=0} /Chrome/ {for(i=1; i<=2; i++){print $2; total+=$6}} END {print total/1024}'

Passing Variables into AWK

var=Chrome
ps aux | awk -v a="$var" '{if(match($12, a)){print $2": "a}else{print $2": Non "a}}'

Extracting unique values between 2 files

awk 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2

Dealing with CSV files that has commas in their fields:

cat file.csv | awk -vFPAT='([^,]*)|("[^"]+")' '{print NF}' | sort | uniq -c
Environment Variables	Description
FS	Field separator, regular expression used to separate fields
RS	Record Separator (lines)
NR	Ordinal number of the current record
NF	Number of fields in the current record
OFS	Output field separator
ORS	Output Record Separator (lines)
FNR	Number of the current record in the current file
Functions	Description
length([string])	The length of its argument taken as a string, if no argument then returns length of $0
index(s, t)	the position in s where the string t occurs, or 0 if it does not.
match(s, r)	the position in s where the regular expression r occurs, or 0 if it does not.
sub(r, t, s)	substitutes t for the first occurrence of the regular expression r in the string s.
gsub(r, t, s)	same as sub except that all occurrences of the regular expression are replaced.
tolower(str)	Returns string in lower characters
toupper(str)	Returns string in uppercase characters
split(str, arr [, fsep ])	Splits the `str` using `fsep` and puts it into `arr`