Skip to content

Instantly share code, notes, and snippets.

@wael34218
Last active December 16, 2021 16:22
Show Gist options
  • Save wael34218/8e6d872368c547c1cae1c63482fbeb86 to your computer and use it in GitHub Desktop.
Save wael34218/8e6d872368c547c1cae1c63482fbeb86 to your computer and use it in GitHub Desktop.

AWK Basics

Introduction

Alfred Aho, Peter Weinberger, and Brian Kernighan - Awk is a utility that enables a programmer to write tiny but effective programs in the form of statements that define text patterns that are to be searched for in each line of a document and the action that is to be taken when a match is found within a line.

AWK Operations:

  • Scans a file line by line
  • Splits each input line into fields
  • Compares input line/fields to pattern
  • Performs action(s) on matched lines

AWK Program Structure

BEGIN{ BEGIN_ACTIONS; }
/regex1/ { ACTIONS; }
/regex2/ { ACTIONS; }
END{ END_ACTIONS; }

The first one is BEGIN, which matches only before any line has been input to the file. This is basically where you can initiate variables and all other kinds of state in your script.

There is also END, which as you may have guessed, will match after the whole input has been handled. This lets you clean up or do some final output before exiting.

Example:

ps aux | awk 'BEGIN{total=0} /Chrome/ {total+=$6} END {print total/1024}'

Actions and Fields

Actions could something like:

  • exit: Ends the program
  • next: Skips to the next line of input
  • print: Print to standard output

Fields:

  • $0: The entire line
  • $1, $2, ... : First field, Second field, ...
  • $NF: Last field
  • $(NF-1): Second last field

Environment Variables

Environment Variables Description
FS Field separator, regular expression used to separate fields
RS Record Separator (lines)
NR Ordinal number of the current record
NF Number of fields in the current record
OFS Output field separator
ORS Output Record Separator (lines)
FNR Number of the current record in the current file

Functions

Functions Description
length([string]) The length of its argument taken as a string, if no argument then returns length of $0
index(s, t) the position in s where the string t occurs, or 0 if it does not.
match(s, r) the position in s where the regular expression r occurs, or 0 if it does not.
sub(r, t, s) substitutes t for the first occurrence of the regular expression r in the string s.
gsub(r, t, s) same as sub except that all occurrences of the regular expression are replaced.
tolower(str) Returns string in lower characters
toupper(str) Returns string in uppercase characters
split(str, arr [, fsep ]) Splits the str using fsep and puts it into arr

gsub example:

awk '{ gsub(/\xef\xbb\xbf/,""); print }' sentences.txt > clean_sentences.txt

If Condition

ps aux | awk '{if(match($12, /Chrome/)){print $2": Chrome Process"}else{print $2": Non-chrome Process"}}'

For Loop

ps aux | awk 'BEGIN{total=0} /Chrome/ {for(i=1; i<=2; i++){print $2; total+=$6}} END {print total/1024}'

Passing Variables into AWK

var=Chrome
ps aux | awk -v a="$var" '{if(match($12, a)){print $2": "a}else{print $2": Non "a}}'

Extracting unique values between 2 files

awk 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2

Dealing with CSV files that has commas in their fields:

cat file.csv | awk -vFPAT='([^,]*)|("[^"]+")' '{print NF}' | sort | uniq -c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment